Prediction of Short-Term Solar Irradiance Using the ProbSparse Attention Mechanism for a Sustainable Energy Development Strategy

Zhuang, Zhenyuan; Wang, Huaizhi; Yu, Cilong

doi:10.3390/su17031075

Open AccessArticle

Prediction of Short-Term Solar Irradiance Using the ProbSparse Attention Mechanism for a Sustainable Energy Development Strategy

by

Zhenyuan Zhuang

,

Huaizhi Wang

^* and

Cilong Yu

College of Mechatronics and Control Engineering, Shenzhen University, Shenzhen 518060, China

^*

Author to whom correspondence should be addressed.

Sustainability 2025, 17(3), 1075; https://doi.org/10.3390/su17031075

Submission received: 29 December 2024 / Revised: 23 January 2025 / Accepted: 26 January 2025 / Published: 28 January 2025

(This article belongs to the Topic Advanced Technology of Smart Battery and Energy Management System of Transportation Electrification)

Download

Browse Figures

Versions Notes

Abstract

:

Sustainability refers to a development approach that meets the needs of the present generation without compromising the ability of future generations to meet their own needs. Solar energy is an inexhaustible and renewable resource. From the perspective of resource utilization, solar power generation has a high degree of sustainability. Therefore, solar power generation is one of the most important ways to transform the energy structure and promote the sustainable development of the economy and society, and it is of great significance for promoting the construction of a resource-conserving and environmentally friendly society. However, solar energy resources also exhibit strong unpredictability; therefore, this paper proposes a novel artificial intelligence (AI) model for short-term solar irradiance prediction in photovoltaic power generation. Leveraging the ProbSparse attention mechanism within an encoder-decoder architecture, the AI model efficiently captures both short- and long-term dependencies in the input sequence. The dingo algorithm is innovatively redesigned to optimize the hyperparameters of the proposed AI model, enhancing model convergence. Data preprocessing involves feature selection based on mutual information, multiple imputations for data cleaning, and median filtering. Evaluation metrics include the mean absolute error (MAE), root mean square error (RMSE), and coefficient of determination (R²). The proposed AI model demonstrates improved efficiency and robust performance in solar irradiance prediction, contributing to advancements in energy management for electrical power and energy systems.

Keywords:

sustainable energy development strategy; renewable energy sources; solar irradiance forecasting; ProbSparse attention; transformer; artificial intelligence; dingo algorithm

1. Introduction

Amid the intensifying challenges of global environmental pollution, ecological changes, and the energy crisis, developing renewable energy has emerged as the optimal strategy for transitioning from fossil fuels to a cleaner energy paradigm [1]. Optimizing the energy structure is the best way to address these issues. Incorporating renewable energy into the energy supply system can effectively alleviate the problems caused by the overuse of fossil fuels. Solar energy, distinguished by its extensive global distribution and abundant availability, has witnessed a substantial surge in penetration within electrical power and energy systems [2]. The power system used is a typical multi-energy system, and photovoltaic energy can provide more energy supply for the system. Nevertheless, solar power generation exhibits intermittent, fluctuating, and stochastic characteristics. The mitigation of these uncertainties and the effective prediction of solar power generation have become urgently imperative [3]. The foremost factor contributing to the instability in solar power output lies in the variability of solar irradiance. Notably, solar irradiance forecasting plays a pivotal role in addressing this challenge [4,5]. Therefore, predicting solar energy resources for a certain period in the future has become an important measure for optimizing energy management systems. For example, accurate forecasts of solar irradiance contribute to the meticulous planning of energy storage system charging schedules and the optimization of energy transmission, thereby mitigating energy losses [6]. Furthermore, solar irradiance prediction enables the proactive anticipation of photovoltaic power generation, resulting in the reduction of reserve capacity and overall generation costs [7]. In a broader context, solar irradiance forecasting not only accommodates the intrinsic instability of solar power generation but also ensures that energy systems can uphold secure, economical, and efficient operations [8]. Hence, the accurate forecasting of solar irradiance assumes pivotal importance in the realm of electrical power and energy systems [9]. These imperatives underscore the critical role played by solar irradiance forecasting in mitigating the challenges posed by the dynamic nature of solar energy and fortifying the reliability of energy systems [10].

So far, research on solar irradiance prediction can be categorized into four types: physical methods, statistical methods, machine learning, and deep learning [11]. Physical methods employ the principles of meteorology and numerical weather models to simulate atmospheric conditions for predicting solar irradiance variations. This approach considers complex meteorological interactions and exhibits a profound understanding of atmospheric processes [12], making it suitable for long-term forecasts. This research emphasizes the utilization of numerical weather in conjunction with support vector regression for large-scale photovoltaic power forecasts and satellite-based cloud motion vector forecasts [13]. This study introduces an innovative approach to generating real-time current-voltage characteristics and forecasting the peak power of photovoltaic modules under actual meteorological conditions using the power-law model and single-diode model parameters [14]. However, the physical methods suffer from high model complexity, high computational cost, and sensitivity to initial conditions, resulting in relatively low prediction accuracy [15]. Statistical prediction methods utilize historical meteorological data and solar energy system output information to forecast future solar irradiance using statistical models. The research reported in Ref. [16] introduces the use of the ARMAX model with exogenous inputs to forecast photovoltaic power output, significantly improving its prediction accuracy compared to the traditional ARIMA model. A new SARIMA model to forecast hourly wind speeds in the coastal areas of Scotland has been proposed [17], demonstrating superior accuracy in predicting future offshore wind-speed time series compared to deep learning-based algorithms. While the statistics-based forecasting methods offer high computational efficiency and can handle some nonlinear relationships well, their ability to deal with complex meteorological dynamics is limited, leading to relatively poor accuracy in long-term forecasting [18,19].

Machine learning methods predict solar irradiance by training neural network models using historical data and are capable of handling nonlinear relationships and large-scale inputs [20]. Compared to physical methods and statistical models, machine learning-based prediction models better capture the complex relationships among input data, thereby improving their accuracy and suitability for medium- and short-term forecast tasks [21]. In a previous study [22], an irradiance prediction method with an integrated framework of robust local mean decomposition and bidirectional long short-term memory is proposed. The study reported in Ref. [23] presents a novel multibranch attentive gated recurrent residual network that is capable of modeling data at various resolutions, extracting the hierarchical features, and capturing short- and long-term dependencies. A novel approach is proposed in Ref. [24] to predict one-week-ahead half-hourly photovoltaic power output in the United Kingdom, leveraging sloped extra-terrestrial irradiance and weather data, enabling the better balancing of electricity supply and demand. Another study [25] introduces a novel hourly stepwise forecasting framework for solar irradiance, employing an integrated hybrid model combined with error correction and variational mode decomposition, significantly enhancing the model’s anti-interference capability and prediction accuracy. However, as the number of model layers increases, machine learning-based prediction models encounter challenges such as the curse of dimensionality and slow network convergence [26].

Deep learning models automatically extract features and are suitable for complex nonlinear problems, possessing strong expressive power to handle large-scale, high-dimensional data [27]. They are particularly well-suited for short-term and real-time predictions [28]. The authors of Ref. [29] propose a prediction model based on dual decomposition with error correction, a strategy-based, improved hybrid deep learning method. This method adopts complete ensemble empirical mode decomposition with adaptive noise and a variational mode. The historical event sequence and error sequence of solar radiation are decomposed, and the short- and long-term features of the data are extracted by a BiLSTM deep learning network, which effectively improves the prediction accuracy. In another study [30], a hybrid solar irradiance forecasting model based on partial mutual information, an enhanced whale optimization algorithm, and deep reinforcement learning is proposed, which, compared to traditional methods and single deep learning models, can more effectively address dynamic variations and exhibits superior performance across multiple forecasting horizons. A deep learning framework utilizing convolutional neural networks and attention mechanisms to extract the spectral information from geostationary satellites for accurate ground-level solar irradiance estimations is proposed in Ref. [31], outperforming traditional databases. The authors of Ref. [32] propose selecting input variables using the information gain factor to enhance the accuracy of solar irradiance prediction models, validating their superiority over the Pearson correlation coefficient. A comprehensive review of deep learning for renewable energy forecasting can be found in Ref. [33]. Traditional deep learning algorithms that are adept at handling time series data, such as a recurrent neural network (RNN), transmit information between hidden layers through linear or nonlinear activation functions to capture effective features from historical sequences. However, this information propagation mode tends to dilute crucial features with noise, leading to the occurrence of gradient vanishing or exploding issues, thereby constraining the capability of deep learning models to handle long time series, including solar irradiance sequences. Despite the incorporation of memory and forget gates in some deep learning algorithms like long short-term memory (LSTM) networks to selectively filter the input information [34], the fundamental operational logic of neural networks remains unaltered, thereby failing to completely mitigate the accumulation of errors and data noises.

The attention mechanism is a pivotal technique for processing sequential data and has been widely applied in domains such as natural language processing and computer vision [35]. By modeling the correlation information among the different positions within input sequences, it automatically learns the importance weights of different positions and prioritizes those with higher relevance during information transmission [36]. This feature renders attention mechanisms particularly effective in handling long sequences and capturing internal dependencies [37]. ProbSparse attention (PSA), as a representative attention mechanism, derives Q, K, and V matrices from the encoding layer, and subsequently obtains attention values using correlation scores [38]. Despite PSA’s widespread use in natural language processing and image generation, there have been no reported studies regarding its application in solar irradiance prediction. The ProbSparse attention mechanism computes the attention weights for each position within input sequences, facilitating interaction and information transmission between positions, thereby effectively mitigating cumulative errors and data noise [39], rendering it highly suitable for solar irradiance prediction tasks. In this paper, a solar irradiance prediction model has been designed, which makes the following core contributions:

⟡: We have innovatively designed an artificial intelligence model for short-term solar irradiance prediction, leveraging the ProbSparse attention mechanism to efficiently capture the inherent short-term and long-term dependencies within input sequences.
⟡: The dingo algorithm has been redesigned to optimize the hyperparameters of the proposed AI prediction model, enhancing the convergence and performance of the prediction model.
⟡: A comprehensive data preprocessing method incorporating feature selection, multiple imputation, and median filtering is introduced to ensure the quality and accuracy of input data.

The performance of the proposed prediction model has been evaluated using the mean absolute error (MAE), root mean square error (RMSE), and coefficient of determination (R²). The simulation results demonstrate that the proposed AI prediction model exhibits higher applicability and practicality in solar irradiance prediction, offering new solutions for addressing challenges in the energy management of hybrid electrical systems.

2. PSA-Based Solar Irradiance Prediction Model

The subject of this paper is the time series of global horizontal irradiance. The sequence features are mainly reflected in the long-term and short-term correlations at various points. The value of irradiance exhibits clear periodic similarity with the variation in solar zenith angle, and the sequence values do not diverge but, rather, only change within a certain range [40]. Therefore, in the case of a sufficient number of historical sequence samples, it can be assumed that the sequence to be predicted is, to some extent, contained within the historical sequence, or shares a similar distribution with a certain part of the historical sequence [41].

The prediction model used in this paper is based on the ProbSparse attention mechanism and, thus, adopts an encoder-decoder architecture that is suitable for this mechanism in the model structure. The overall structure of the model can be divided into the embedding layer, feature extraction layer, and regression layer, as shown in Figure 1.

2.1. Embedding Layer

The embedding layer is the starting layer of the model structure, and its main purpose is to encode the input sequence and transform it into a form that can be processed by the feature extraction layer. Since input time sequences are typically long, they are often batched into a set of shorter sequence inputs. In this article, the embedding layer transforms the input sequence group into a high-dimensional tensor using three embedding methods.

The first approach is numerical embedding, wherein the scalar inputs from the original input sequence are fed into a Conv1d layer. Through convolutional operations, high-dimensional vectors in the same format as the other embedding sequences are generated. This allows the retention of numerical features from the input sequence, and the vectors are then summed with the other embedding sequences for subsequent processing. This layer can be represented by the following formula:

X_{j + 1}^{t} = C o n v 1 d (X_{j}^{t})

(1)

where

p (x)

represents the probability of

x = x_{i}

occurring,

p (y)

represents the probability of

y = y_{i}

occurring, and

p (x, y)

represents the probability of

x = x_{i}

and

y = y_{i}

occurring simultaneously, which is the joint probability of

x

and

y

.

The second method is positional embedding. Since the core mechanism used by the feature extraction layer is the attention mechanism, and the attention mechanism provides a relevance retrieval table, changing the position of any two elements in the table does not affect the values, only the positions of the values. Therefore, the attention mechanism cannot extract positional relationship features from the input sequence, and cannot effectively capture the long-term features of the original input sequence. The embedding layer needs to provide positional information for the subsequent calculations. Therefore, it is possible to use the sine and cosine functions to encode the positions of the input vectors. Even index positions can be encoded as:

P E (p o s, 2 i) = \sin (\frac{p o s}{10000^{\frac{2 i}{d_{m o d e l}}}})

(2)

Odd index positions can be encoded as:

P E (p o s, 2 i + 1) = \cos (\frac{p o s}{10000^{\frac{2 i}{d_{m o d e l}}}})

(3)

In this context, PE represents the positional encoding sequence, pos indicates the position of an element in the sequence, and

d_{m o d e l}

represents the size of the hidden layer, i.e., the dimensionality of the sequence after positional embedding.

The third step is time embedding. Since the irradiance sequence is time series data, the time series usually records the exact moment of each set of data through time units such as the year, month, day, hour, and grade. The exact time information can provide new features for model training, so the embedding layer is required to integrate this part of feature information into the input sequence.

The encoded input vector set is obtained by adding positional encoding to the sequence encoding. The encoded input vectors contain both the inherent sequence features and positional information. These vectors are then trained through a linear layer to map the sequence features into

Q

,

K

, and

V

matrices, which serve as the input to the feature extraction layer.

2.2. Feature Extraction Layer

In deep learning models, the feature extraction layer is often a key component for achieving predictive capabilities. In this article, the feature extraction layer consists of an encoder and a decoder. The encoder-decoder structure enables training, validation, and prediction for time series data.

2.2.1. Encoder

The encoder consists of three self-attention layers. In the traditional self-attention mechanism, the multi-head attention mechanism processes the feature matrices

Q

,

K

, and

V

of the input vectors to obtain the attention values and then uses these attention values to make sequence predictions. The

Q

matrix is the query matrix, with one of its vectors referred to as the q vector. The

K

matrix is the key matrix, with one of its vectors known as the k vector. The

V

matrix is the value matrix, and one of its vectors is called the v vector. The

Q

,

K

, and

V

matrices are all obtained through the training of the embedding layer model. For each input vector, its query vector needs to be dot-produced with the key vectors of all input vectors to acquire the respective relevance scores. These relevance scores are then transformed into a probability form through

s o f t m a x

, and the outcomes are pondered and combined with the value vectors to derive the anticipated output sequence. This part of the structure can be represented by Figure 2, and this part of the process can be expressed by Equation (4):

A t t e n t i o n (Q, K, V) = s o f t m a x (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(4)

where

A t t e n t i o n

refers to the attention value obtained after calculation. The product of

Q

and

K

matrices is known as the correlation score. This score is normalized by dividing

d_{k}

, and the probability form of the score is obtained through

s o f t m a x

.

When calculating the self-attention values, the query vector of each encoded input vector needs to be dot-produced with the key vectors of all input vectors. If the length of the original input sequence is long, this will lead to excessive computational load in the encoding layer, affecting the efficiency of the model. Therefore, ProbSparse attention improves the attention mechanism based on traditional attention. In reality, not every key vector has a strong correlation with each query vector. Experimental evidence shows that most query vectors have little correlation with key vectors, and their dot product values tend to approach zero, making this part of the computation meaningless. The ProbSparse attention mechanism optimizes self-attention by effectively reducing the computational load of the attention mechanism if it can find effective pairings in the entire sequence.

The ProbSparse attention mechanism searches for valid query vectors by measuring the sparsity of the Q matrix, rephrasing the original self-attention into probabilistic terms as follows:

A t t e n t i o n (q_{i}, K, V) = \sum_{j} \frac{k (q_{i}, k_{j})}{\sum_{l} k (q_{i}, k_{l})} v_{j}

(5)

where

q_{i}

represents the i th query vector, and

k (q_{i}, k_{j})

represents the distribution function of the k vector.

The effective information content of the

Q

and

K

matrices can be measured by the distribution of key vectors under the i-th query vector. With the probability form of self-attention in Equation (5), the distribution function can be expressed in the form of Equation (6):

p (k_{j}| q_{i}) = \sum_{j} \frac{k (q_{i}, k_{j})}{\sum_{l} k (q_{i}, k_{l})} v_{j}

(6)

If this distribution is close to a uniform distribution, this proves that the query vector is lazy and cannot contribute effectively to the attention values. Conversely, if the distribution shows a large fluctuation, this proves that the query vector contributes significantly to the calculation of attention values. Therefore, the problem can be transformed into how to measure the similarity between distributions, and this similarity can be measured by KL divergence. By calculating the KL divergence between the distribution of key values conditioned on the i-th query vector and the uniform distribution, the sparsity of each query vector can be obtained. The KL divergence can be obtained with Equation (7):

K L (q ‖p) = l n \sum_{l = 1}^{L_{k}} e^{q_{i} k_{l}^{T} / \sqrt{d}} - \frac{1}{L_{k}} \sum_{j = 1}^{L} q_{i} k_{l}^{T} / \sqrt{d} - l n L_{k}

(7)

where, supposing that the distribution function of uniform distribution is in the form of:

q (k_{j}| q_{i}) = \frac{1}{L_{k}}

(8)

L_{k}

is an arbitrary constant.

From this calculation, the required query vectors can be filtered. The model extracts the required query and key vectors through self-attention distilling, which is achieved through the maximum pooling layer of the following formula:

X_{i + 1}^{t} = M a x p o o l (E L U (C o n v 1 d ({[X_{i}^{t}]}_{A B})))

(9)

E L U (x) = \{\begin{matrix} e^{x} - 1, x < 0 \\ x, x \geq 0 \end{matrix}

(10)

where

A B

contains the key operations in the ProbSparse attention mechanism, and

E L U

is the activation function, the formula for which can be shown as Equation (9).

M a x p o o l

indicates the maximum pooling operation.

The new attention mechanism obtained in this way is called ProbSparse Attention (PSA). In the actual experimental process, PSA also has a positive impact on prediction accuracy. After the PSA mechanism is calculated in the encoding layer, a new encoding sequence will be output, which contains the transformed positional encoding information and “semantic” encoding information.

2.2.2. Decoder

The decoder consists of a masked ProbSparse attention layer and a ProbSparse attention layer. During training, the input to the decoder is the encoded target sequence, while during the prediction process, the target sequence is unknown information yet to be predicted. However, the encoder still requires an input sequence, which is generally a randomly generated data sequence. Since the self-attention mechanism can handle global data, in order to avoid the influence of future random data on the prediction process, ProbSparse Attention is needed to process the randomly generated input sequence. This involves retaining only the confirmed data that have been predicted and masking out future random interference data. This step can be represented as follows:

X_{d e}^{T} = (X_{T o k e n}^{T}, X_{0}^{T})

(11)

where

X_{d e}^{T}

is the sequence with a form that is consistent with the input sequence,

X_{T o k e n}^{T}

is the sequence to be trained, and

X_{0}^{T}

is the placeholder of the target sequence.

The traditional transformer network applies “dynamic decoding” in the decoding layer. Due to the characteristics of NLP problems, it can only output serial results, resulting in the low output efficiency of the program. However, the context of this paper is focused on time series problems, not within the context of NLP problems. Therefore, the decoder structure adopted here uses a forward process, which can synchronize and parallel output all sequences. Thus, the output efficiency of the model is improved.

2.2.3. Regression Layer

The regression layer’s task is to reconstruct the initial sequence from the encoded sequence that is derived following the decoding process. In the embedding layer, the input sequence is transformed into a 512-dimensional encoded vector, and, after model computation, it still remains a high-dimensional vector. Therefore, the regression layer is required to perform inverse embedding on the vectors.

In this paper, a fully connected layer is used to implement the functionality of the regression layer. The fully connected layer can be represented as follows:

y = f (W x + b)

(12)

S i g m o i d (x) = \frac{1}{1 + e^{- x}}

(13)

where

y

is the output sequence,

f

is the activation function, and

W

and

b

are the weight matrix and bias vector. The activation function can be chosen as the

S i g m o i d

function, and, by training and iteratively updating the values of the weight matrix and bias vector, sequence restoration can be achieved.

3. Feedforward Network and Parameter Iteration Method

To achieve the best prediction performance, it is necessary to fine-tune certain hyperparameters of the network model in practical applications. Various destination input sequences differ with respect to the intricacy of features, the extent of the sequence, and the strength of noise disturbance, among additional considerations. Therefore, hyperparameters that determine the interpretability of the model will impact the accuracy of model predictions. Take, for instance, the count of attention heads, which influences the model’s capacity to grasp global interdependencies in time series information. An insufficient number of attention heads could cause the model to miss intricate relationships within the data, whereas an excessive number might cause the model to overfit. Likewise, additional parameters that can impact the precision of model predictions encompass the quantity of encoding and decoding layers, the dimensionality of the hidden layers in the feedforward network, and the sequence length for both encoding and decoding inputs.

Therefore, optimizing the hyperparameters in the model’s feedforward network can effectively improve the model’s performance and enhance its generalizability, enabling it to achieve good results when dealing with different datasets.

3.1. Dingo Optimizer (DOX) Algorithm Optimization

The DOX is an optimization algorithm that takes n-heads, d-model, and c-out as its three-dimensional inputs to obtain the optimal hyperparameters for the current dataset. The overall model serves as the fitness function, with MSE used as the optimization parameter, resulting in the fitness function outputting MSE. The algorithm selects the best solution, based on minimizing MSE.

The operating principle of DOX is shown in Figure 3. Its framework shares similarities with the grey wolf algorithm, as their position update methods and optimal solution search approaches are fundamentally similar. However, the grey wolf algorithm lacks a greedy mechanism, leading to slower convergence. Alternatively, the DOX fusion integrates the grey wolf optimizer with genetic algorithm principles, adding a screening procedure to reduce the impact of search points that are too far from the target point. In order to preserve the native algorithm’s exploration abilities and prevent getting stuck in local optima, the DOX introduces supplementary strategies for updating positions.

The dingo algorithm begins by partitioning the initial population into three distinct groups using two randomly produced values, rand_1 and rand_2. Each group undergoes a different position update behavior, with the encirclement behavior involving a search near the current global optimal position. The formula for updating positions in this approach closely mirrors that of the grey wolf algorithm, as depicted in Equation (14).

\vec{x_{l} (t + 1)} = β_{i} \sum_{k = 1}^{n a} \frac{[\vec{φ_{k}} (t) - \vec{x_{i}} (t)]}{n a} - \vec{x_{b}} (t)

(14)

The pursuit of a solution by the algorithm’s hunting behavior involves a localized search around the best solution found so far. Unlike the encircling behavior, where the movement of each individual is guided by the collective actions of the group, resulting in a consistent pattern of positional shifts among the swarm, the hunting behavior’s approach to updating positions is dictated by the previous iteration’s optimal solution and the location of a randomly chosen member from the entire population. This introduces a stochastic element to the positional updates, which reduces the likelihood of converging on a suboptimal local solution. This method of updating positions is captured in Equation (15).

\vec{x_{l} (t + 1)} = \vec{x_{b}} (t) + β_{1} e^{β_{2}} (\vec{x_{r 1}} (t) - \vec{x_{i}} (t))

(15)

The mechanism for updating the position in the search behavior is not contingent upon the optimal solution; it entails exploring the space between the current position and a position chosen at random, which, in turn, amplifies the stochastic nature of the position updates. This refinement boosts the model’s ability to explore and reduces the probability of settling for an inferior solution. This idea is quantitatively represented in Equation (16).

\vec{x_{l} (t + 1)} = \frac{1}{2} [e^{β_{2}} * \vec{x_{r 1}} (t) - {(- 1)}^{δ} * \vec{x_{i}} (t)]

(16)

Following the updating of positions, a fitness-based selection process is employed where points with a fitness below a certain criterion are subject to further position updates. This selective strategy enhances the model’s rate of convergence. The hyperparameters to be optimized in this paper are all integers; therefore, the search capabilities required for the optimization algorithm are relatively low. Due to the complexity of the deep learning model, when the model itself is used as the fitness function, a higher convergence speed is required for the optimization algorithm. The dingo optimizer algorithm is more suitable for this optimization task.

3.2. Parameter Iteration Method

In the process of model construction, there are certain parameters that need to be learned. During training, these parameters are optimized and updated at each iteration to ensure that the model achieves better predictive performance. To enable iterative updates to the parameters, the model requires a closed-loop feedback system, which necessitates the use of the backpropagation algorithm and the Adam optimization algorithm.

3.2.1. Backpropagation Algorithm

The backpropagation algorithm is applied to those layers requiring parameter optimization in the network model used in this paper. MSE is selected as the error calculation index in the network model used in this paper, the error between each layer is calculated in the process of data forward propagation, the gradient of each loss function to the parameter is calculated through the chain rule, and the gradient descent method is used to optimize the loss function, which can achieve the optimization training of the parameter. This process can be represented by Equation (17):

w^{+} = w - δ \frac{\partial E_{t o t e l}}{\partial w}

(17)

where

w^{+}

is the updated parameter,

E_{t o t e l}

is the overall error of the optimized part of the model, and

w

is the weight parameter to be optimized.

3.2.2. Adam Optimization Algorithm

The Adam algorithm is an optimization algorithm that can automatically update the learning rate. Updating the learning rate at each epoch can improve the training effectiveness of the model and alleviate potential overfitting or underfitting situations. The Adam algorithm utilizes the exponentially decaying moving average of the first moment (as calculated in Equation (18)) and the exponentially decaying moving average of the second moment (as calculated in Equation (19)). By combining these two, the final update formula is obtained, as shown below:

M_{t} = β_{1} \cdot M_{t - 1} + (1 - β_{1}) \cdot g_{t}

(18)

N_{t} = β_{1} \cdot N_{t - 1} + (1 - β_{2}) \cdot {g_{t}}^{2}

(19)

Δ w_{t} = α \cdot \frac{M_{t}}{\sqrt{N_{t} + θ}}

(20)

where

M_{t}

is the first-order momentum of the current step,

M_{t - 1}

is the first-order momentum of the previous step,

N_{t}

is the second-order momentum of the current step, and

N_{t - 1}

is the second-order momentum of the previous step.

β_{1}

and

β_{2}

are the two hyperparameters, theta is the coefficient that prevents the denominator from approaching 0, and alpha is the model learning rate.

4. Data Preprocessing

Generally, due to the influence of equipment or human factors, the photovoltaic power generation data recorded by photovoltaic power stations may exhibit phenomena such as repetition, missing data, or anomalies [42]. Therefore, data preprocessing is required before using the dataset.

4.1. Feature Selection

The use of multidimensional time series data in this study involves the incorporation of environmental data that exhibits a high correlation with the historical irradiance data, aside from the irradiance data itself. This inclusion serves to enhance the predictive accuracy of the model. Therefore, the selection of environmental data sequences with high correlation to the irradiance series in the dataset is one of the crucial steps in data preprocessing.

The dataset used in this paper has 21 features; except for the global horizontal irradiance, the remaining 20 features can be regarded as environmental features. Certain features, like the solar elevation angle, diffuse irradiance, and azimuthal irradiance, show a strong linear relationship with horizontal irradiance. Alternatively, characteristics like wind speed, wind direction, and humidity exhibit a less pronounced linear correlation with solar irradiation. Despite the lower linear correlation of some features with the horizontal irradiance, they still exhibit a strong correlation with the target feature. When training the model to predict future irradiance, features with a strong nonlinear correlation are more meaningful. Therefore, this research employs mutual information to ascertain the relationship between the ambient variables and the sequence of global horizontal irradiance:

I (X; Y) = D_{K L} (p (x, y)‖ p (x) q (y))

(21)

where

D_{K L}

is the solution for the

K L

divergence,

p (x, y)

is the joint distribution between the two substitutions, and

p (x) q (y)

is the independent distribution of the two quantities.

Evaluating the mutual information between the sequence of global horizontal irradiance and other sequences of environmental characteristics allows for the identification of features that strongly correlate with the target sequence and are suitable for use as model inputs.

4.2. Data Cleaning

Datasets often contain errors, duplicates, anomalies, or missing data. This part of the data will have a great impact on model prediction, so it is necessary to clean the data during data preprocessing. The main problem of the dataset adopted in this paper is that there is a large amount of missing data, considering that the dataset has multiple features with high linear correlation, and the missing value is randomly distributed among the features.

Therefore, the multiple interpolation method is used to complete the dataset. Multiple interpolation methods estimate the missing values by building models, such as linear regression based on the sequence of the missing values themselves, and decision trees based on the relevant features. In general, the multiple interpolation method requires that the proportion of missing values is not too large, to ensure the accuracy of model construction. The proportion of missing values in the dataset used in this paper is about 4, and the multiple interpolation method can be used.

4.3. Filtering

The primary objective of filtering is to diminish noise and remove anomalies. For this research, the dataset is subjected to median filtering following multiple imputations, successfully eradicating the biases that arise from the imputation of missing values while preserving the integrity of the original data’s curve and boundaries.

The process of median filtering can essentially be broken down into two main stages. Initially, it is crucial to establish the dimensions of the filter window, a decision that markedly influences the outcome of the filtering. An inadequately sized window might not adequately remove substantial noise and disturbances, whereas an excessively large window could compromise the integrity of the original data and impair predictive accuracy. Consequently, the optimal window size should be chosen in consideration of the noise and signal properties.

After establishing the window size, the median filtering process is carried out on every feature in the dataset. The window moves systematically through the dataset, from beginning to end, and the contained data points are sorted. Subsequently, the median of these points is adopted as the filtered result. This procedure can be formulated mathematically as:

{F i l t e r}_{i} = M e d i a n (W i n d o w)

(22)

where

{F i l t e r}_{i}

denotes the outcome of the filtering process,

W i n d o w

signifies the information contained in the filter’s window, and the

M e d i a n

function computes the median value of the data within the window.

5. Performance Metrics

In the process of training and evaluating deep learning models, various statistical metrics are commonly used to measure their predictive performance. These include the mean absolute error (MAE), mean squared error (MSE), root mean squared error (RMSE), and coefficient of determination (R²). The MSE is obtained by calculating the average of the squared differences between the predicted and actual values. In the model constructed in this paper, the MSE is used as the evaluation metric during the training process. The MSE is sensitive to outliers and can effectively correct large errors. Therefore, the MSE is used as an evaluation index for model training and hyperparameter optimization in this paper.

During the phase of forecasting, the assessment of a model’s efficacy is multifaceted. Consequently, this study employs the mean absolute error (MAE), mean squared error (MSE), root mean squared error (RMSE), and coefficient of determination (R²) as metrics to gauge the model’s predictive capabilities. The MAE quantifies the mean absolute discrepancy between the predictions and actual outcomes, which proves valuable for models that demand greater resilience. The RMSE computes the square root of the average squared deviations between the predicted and actual data. The bias coefficient primarily helps in assessing how closely the model’s predictive trajectory matches the actual data trajectory, which is instrumental in comparing the predictive precision of different models when they are applied to the same dataset.

The MAE can be calculated using the following formula:

M A E = \frac{1}{N_{S}} \sum_{i = 1}^{N_{s}} |o_{i} - y_{f i}|

(23)

MSE and RMSE can be calculated as follows:

M S E = \frac{1}{N_{S}} \sum_{i = 1}^{N_{s}} {(o_{i} - y_{f i})}^{2}

(24)

R M S E = \sqrt{\frac{1}{N_{S}} \sum_{i = 1}^{N_{s}} {(o_{i} - y_{f i})}^{2}}

(25)

R² can be calculated as follows:

R^{2} = \frac{{(N_{s} \sum_{t = 1}^{N_{s}} o_{i} y_{f i} - \sum_{t = 1}^{N_{s}} o_{i} \sum_{t = 1}^{N_{s}} y_{f i})}^{2}}{[N_{s} \sum_{t = 1}^{N_{s}} {(y_{f i})}^{2} - {(\sum_{t = 1}^{N_{s}} y_{f i})}^{2}] [N_{s} \sum_{t = 1}^{N_{s}} {(o_{i})}^{2} - {(\sum_{t = 1}^{N_{s}} o_{i})}^{2}]}

(26)

where

N_{s}

is the total sample,

o_{i}

is the predicted value, and

y_{f i}

is the true value.

6. Case Study

In order to verify the effect of the predicted model on the actual forecast, we used the data from a Vaulx-en-Velin photovoltaic power station in Lyon, France to verify the model. This dataset provides the irradiance of photovoltaic power plants for the whole year of 2018 and the remaining 21 environmental characteristics datasets, which can provide sufficient data volume for model training.

6.1. Data Description and Simulation Setting

The dataset used in this study is sourced from the Vaulx-en-Velin photovoltaic power station in Lyon, France. The Vaulx-en-Velin region is located at 45.7786° N latitude and 4.9225° E longitude, and it has a temperate maritime climate with distinct seasonal variations. The region exhibits characteristic temperate maritime climate features in terms of temperature, annual precipitation, and relative humidity throughout the year. Solar radiation is higher in the summer and autumn, and lower in the spring and winter seasons. The dataset provides solar irradiance and environmental characteristics data for the entire year of 2018. The total number of data points in this dataset is 525,450, consisting of 16 features.

According to the DOX algorithm and the ProbSparse attention-based PV prediction algorithm above, the input parameters of the prediction model were selected using the mutual information method, of which 16 input parameters were finally used. The model’s learning rate was updated in realtime using the Adam optimizer in each training iteration, with the initial learning rate set to 0.0001. For this research, traditional time series forecasting methods such as LSTM, CNN, and the Elman neural network were chosen as benchmark algorithms.

6.2. Irradiance Prediction Based on ProbSparse Attention

Based on the theory presented in Section 3, an experimental model for solar irradiance simulation prediction was constructed. Prior to prediction, it was essential to optimize the model’s hyperparameters to ensure the effectiveness of the predictions. The choice of hyperparameters varies for different datasets and significantly impacts the model’s interpretability and training efficiency, thereby influencing the prediction results. The critical hyperparameters that were fine-tuned were the count of attention heads, the size of the model, the batch size for the training data, and the learning rate of the optimizer. To bolster the model’s capacity for generalization and to confirm its robustness on various datasets, we utilized the DOX algorithm for the tuning of hyperparameters. Following DOX-based optimization, the experiment yielded the following optimal hyperparameter setup: 15 attention heads, a model dimension of 512, an input sequence length of 96, a training batch size of 32, and an initial learning rate for the optimizer of 0.001. The split ratio between the training set and the validation set was 8:2.

The input sequence length was set to 96, and the label length for the decoding layer input was set to 48. Predictions for the next timestep’s solar irradiance value were made using a sliding time window approach. The first 80% of the dataset was used as the training set, while the remaining 20% was used as the validation set to evaluate the model. Typical daily sequences were randomly selected from the obtained prediction results, and the predicted images were compared with the original solar irradiance sequence images, as shown in Figure 4 and Figure 5. It is evident that the predicted sequences closely align with the original sequences, effectively enabling short-term solar irradiance prediction. Additionally, a scatter plot comparing the predicted images with the original sequences was generated using statistical methods, as illustrated in Figure 6. The plot demonstrates that the majority of predicted data points linearly coincide with the original data and exhibit a clear convergence trend, confirming the model’s capability to effectively predict solar irradiance sequences.

6.3. Comparison with the Traditional Time Series Prediction Model

At present, linear prediction models such as LSTM, GRU, BP, CNN, and the Elman neural network are usually used in time series prediction tasks. For this paper, the LSTM, CNN, and Elman neural network models were selected as the control models. The prediction performance of the ProbSparse attention-based irradiance prediction model was analyzed.

Data on global horizontal irradiance, as recorded by photovoltaic facilities, is subject to a range of environmental conditions, including the dry bulb temperature, zenith brightness, and atmospheric humidity. For this research, the dataset was sourced from the Lyon area in France, which is characterized by a temperate oceanic climate and marked by well-defined seasonal changes. The impact of these environmental variables is pronounced across the different seasons, with fluctuations in certain factors potentially altering the patterns of global irradiance measurements at photovoltaic plants. Consequently, the experimental data are believed to present varying models, depending on the season. During the comparative testing of the LSTM and CNN models, the data were segmented into four distinct subsets corresponding to spring, summer, autumn, and winter. The models were then trained and validated using subsets from each of the four seasons. The performance metrics derived from these experiments are presented in Table 1.

Since datasets from different seasons can be considered to have different data patterns, training and validating the model using different seasonal datasets can test the model’s adaptability to data exhibiting different patterns. The experimental data were randomly selected for comparison with images of predictions using the proposed method, CNN, and LSTM in spring, summer, autumn, and winter. The resulting images are shown in Figure 7, Figure 8, Figure 9 and Figure 10.

It can be observed that the prediction model based on ProbSparse Attention outperforms CNN and LSTM in most cases, with slightly lower performance metrics compared to LSTM in winter, a slightly lower R² and a slightly higher MAE in spring, and higher performance metrics in summer and autumn compared to LSTM. The performance metrics of the proposed method in this study are consistently superior to CNN networks across the different seasonal patterns. The images also visually demonstrate that the proposed method yields better prediction results compared to traditional baseline prediction methods. This indicates that the prediction model based on ProbSparse Attention is more suitable for short-term irradiance prediction.

Box plots provide a more intuitive comparison for the prediction models. By plotting the data trained by the four models, along with the original data on the same box plot, the visual results thus obtained are shown in Figure 11. It can be seen that compared with the LSTM network, the boxplot style of presentation of the PSA method is more similar to the graph of the original data. This indicates that in its overall prediction, the proposed method in this study outperforms the comparative networks in terms of prediction effectiveness.

At the same time, changing the length of the model’s input sequence can affect the model’s final predictive performance. The input sequence lengths for the ProbSparse attention predictive model, LSTM, and CNN neural networks were modified to 32, 96, and 168. In the predictive model designed in this paper, in order to match the input sequence length, the label length of the decoding layer was set to 18, 68, and 116, respectively. This resulted in four sets of evaluation metrics, as shown in Table 2.

It can be observed that all predictive models have improved the predictive performance to some extent as the input sequence length increases. However, during the experimental process, in order to achieve continuous point prediction, the input sequence length serves as the window size of the sliding time window. Increasing the window size of the sliding time window will increase the overall data input volume, leading to a decrease in the model’s operational efficiency. The proposed method exhibits stronger interpretability for sequences, with relatively small performance variations across different input sequence lengths. When the input sequence length is relatively small, its performance is significantly better than that of the LSTM and CNN neural network models.

7. Conclusions

Ensuring the stability of electrical grids is paramount, and this necessitates the accurate forecasting of photovoltaic power generation, due to the inherent variability and instability of solar energy. The high-precision prediction of PV output has, therefore, become a vital component in the deployment of solar energy within the power system. Therefore, this paper proposes a deep learning network based on the ProbSparse attention mechanism for short-term irradiance prediction. The main advantages of this method are as follows: (1) compared to the commonly used linear neural networks in short-term irradiance prediction, this method demonstrates better predictive performance and is less prone to issues such as vanishing or exploding gradients. (2) This method introduces a novel artificial intelligence prediction model that utilizes the ProbSparse attention mechanism, which enhances the overall operational efficiency of the model, ensuring real-time and high efficiency in irradiance prediction. (3) Employing dingo optimization for autonomous optimization of the model’s hyperparameters reduces the manual cost of model deployment and enhances the model’s versatility.

Through extensive experimental research evaluating various performance indicators, the proposed method has shown superior precision over other reference algorithms across diverse seasons and forecasting time frames. Consequently, the forecasting model introduced in this study is considered to have considerable promise for utilization in the domain of short-term solar irradiance forecasting.

In the future, we will delve deeper into the application of attention mechanisms in the field of solar irradiance prediction. Currently, the core of our prediction approach involves utilizing the ProbSparse attention mechanism to forecast long-term time series data. However, recent studies have indicated that the practical predictive performance of attention mechanisms in time series forecasting may not surpass that of linear networks. This discrepancy may stem from the fact that the attention mechanisms we employ rely on segmenting the input sequences, akin to their application in natural language processing. However, the representation of time series features occurs over an extended period, and segmenting the input sequences might disrupt the inherent characteristics of the original sequence. This is why some studies have reverted to using linear networks for time series prediction. Nevertheless, attention mechanisms retain immense potential in the realm of time series forecasting. We aspire to address the aforementioned issues through in-depth, interpretable research into attention mechanisms, aiming to enhance their effectiveness in solar irradiance prediction.

Author Contributions

Conceptualization, Z.Z. and H.W.; methodology, Z.Z., H.W. and C.Y.; software, Z.Z.; validation, Z.Z., H.W. and C.Y.; formal analysis, H.W.; investigation, Z.Z. and H.W.; resources, H.W.; data curation, C.Y.; writing—original draft preparation, Z.Z.; writing—review and editing, H.W.; visualization, Z.Z. and H.W.; supervision, H.W.; project administration, H.W.; funding acquisition, H.W. and C.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study did not require ethical approval.

Informed Consent Statement

The study did not involve humans.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

The authors thank the anonymous reviewers for their invaluable input.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Liu, W.-J.; Chiu, W.-Y.; Hua, W. Blockchain-enabled renewable energy certificate trading: A secure and privacy-preserving approach. Energy 2024, 290, 130110. [Google Scholar] [CrossRef]
Falope, T.; Lao, L.; Hanak, D.; Huo, D. Hybrid energy system integration and management for solar energy: A review. Energy Convers. Manag. X 2024, 21, 100527. [Google Scholar] [CrossRef]
Cabello-López, T.; Carranza-García, M.; Riquelme, J.C.; García-Gutiérrez, J. Forecasting solar energy production in Spain: A comparison of univariate and multivariate models at the national level. Appl. Energy 2023, 350, 121645. [Google Scholar] [CrossRef]
Sosa-Tinoco, I.; Prósper, M.A.; Miguez-Macho, G. Development of a solar energy forecasting system for two real solar plants based on WRF Solar with aerosol input and a solar plant model. Sol. Energy 2022, 240, 329–341. [Google Scholar] [CrossRef]
Khan, W.; Walker, S.; Zeiler, W. Improved solar photovoltaic energy generation forecast using deep learning-based ensemble stacking approach. Energy 2022, 240, 122812. [Google Scholar] [CrossRef]
Wang, H.; Cai, R.; Zhou, B.; Aziz, S.; Qin, B.; Voropai, N.; Gan, L.; Barakhtenko, E. Solar irradiance forecasting based on direct explainable neural network. Energy Convers. Manag. 2020, 226, 113487. [Google Scholar] [CrossRef]
Cardo-Miota, J.; Pérez, E.; Beltran, H. Deep learning-based forecasting of the automatic Frequency Reserve Restoration band price in the Iberian electricity market. Sustain. Energy Grids Networks 2023, 35, 101110. [Google Scholar] [CrossRef]
Kim, H.; Lee, D. Probabilistic Solar Power Forecasting Based on Bivariate Conditional Solar Irradiation Distributions. IEEE Trans. Sustain. Energy 2021, 12, 2031–2041. [Google Scholar] [CrossRef]
Fu, X.; Wu, X.; Zhang, C.; Fan, S.; Liu, N. Planning of distributed renewable energy systems under uncertainty based on statistical machine learning. Prot. Control. Mod. Power Syst. 2022, 7, 41. [Google Scholar] [CrossRef]
Shan, S.; Xie, X.; Fan, T.; Xiao, Y.; Ding, Z.; Zhang, K.; Wei, H. A deep-learning based solar irradiance forecast using missing data. IET Renew. Power Gener. 2022, 16, 1462–1473. [Google Scholar] [CrossRef]
Zafar, M.H.; Khan, N.M.; Mansoor, M.; Mirza, A.F.; Moosavi, S.K.R.; Sanfilippo, F. Adaptive ML-based technique for renewable energy system power forecasting in hybrid PV-Wind farms power conversion systems. Energy Convers. Manag. 2022, 258, 115564. [Google Scholar] [CrossRef]
Cheng, S.; Yu, Z.; Liu, Y.; Zuo, X. Power system transient stability assessment based on the multiple paralleled convolutional neural network and gated recurrent unit. Prot. Control. Mod. Power Syst. 2022, 7, 39. [Google Scholar] [CrossRef]
Wolff, B.; Kühnert, J.; Lorenz, E.; Kramer, O.; Heinemann, D. Comparing support vector regression for PV power forecasting to a physical modeling approach using measurement, numerical weather prediction, and cloud motion data. Sol. Energy 2016, 135, 197–208. [Google Scholar] [CrossRef]
Zaimi, M.; El Ainaoui, K.; Assaid, E.M. Mathematical models to forecast temporal variations of power law shape parameters of a PV module working in real weather conditions: Prediction of maximum power and comparison with single-diode model. Sol. Energy 2023, 266, 112197. [Google Scholar] [CrossRef]
Huang, S.; Zhou, Q.; Shen, J.; Zhou, H.; Yong, B. Multistage spatio-temporal attention network based on NODE for short-term PV power forecasting. Energy 2024, 290, 130308. [Google Scholar] [CrossRef]
Li, Y.; Su, Y.; Shu, L. An ARMAX model for forecasting the power output of a grid connected photovoltaic system. Renew. Energy 2014, 66, 78–89. [Google Scholar] [CrossRef]
Liu, X.; Lin, Z.; Feng, Z. Short-term offshore wind speed forecast by seasonal ARIMA—A comparison against GRU and LSTM. Energy 2021, 227, 120492. [Google Scholar] [CrossRef]
Ahsan, F.; Dana, N.H.; Sarker, S.K.; Li, L.; Muyeen, S.M.; Ali, M.F.; Tasneem, Z.; Hasan, M.M.; Abhi, S.H.; Islam, M.R.; et al. Data-driven next-generation smart grid towards sustainable energy evolution: Techniques and technology review. Prot. Control. Mod. Power Syst. 2023, 8, 43. [Google Scholar] [CrossRef]
Gandhi, O.; Zhang, W.; Kumar, D.S.; Rodríguez-Gallegos, C.D.; Yagli, G.M.; Yang, D.; Reindl, T.; Srinivasan, D. The value of solar forecasts and the cost of their errors: A review. Renew. Sustain. Energy Rev. 2023, 189 Pt B, 113915. [Google Scholar] [CrossRef]
Díaz-Bedoya, D.; González-Rodríguez, M.; Clairand, J.-M.; Serrano-Guerrero, X.; Escrivá-Escrivá, G. Forecasting Univariate Solar Irradiance using Machine learning models: A case study of two Andean Cities. Energy Convers. Manag. 2023, 296, 117618. [Google Scholar] [CrossRef]
Ramadhan, R.A.; Heatubun, Y.R.; Tan, S.F.; Lee, H.-J. Comparison of physical and machine learning models for estimating solar irradiance and photovoltaic power. Renew. Energy 2021, 178, 1006–1019. [Google Scholar] [CrossRef]
Gao, H.; Qiu, S.; Fang, J.; Ma, N.; Wang, J.; Cheng, K.; Wang, H.; Zhu, Y.; Hu, D.; Liu, H.; et al. Short-Term Prediction of PV Power Based on Combined Modal Decomposition and NARX-LSTM-LightGBM. Sustainability 2023, 15, 8266. [Google Scholar] [CrossRef]
Ziyabari, S.; Du, L.; Biswas, S.K. Multibranch Attentive Gated ResNet for Short-Term Spatio-Temporal Solar Irradiance Forecasting. IEEE Trans. Ind. Appl. 2022, 58, 28–38. [Google Scholar] [CrossRef]
Frederiksen, C.A.F.; Cai, Z. Novel machine learning approach for solar photovoltaic energy output forecast using extra-terrestrial solar irradiance. Appl. Energy 2022, 306, 118152. [Google Scholar] [CrossRef]
Liu, J.; Huang, X.; Li, Q.; Chen, Z.; Liu, G.; Tai, Y. Hourly stepwise forecasting for solar irradiance using integrated hybrid models CNN-LSTM-MLP combined with error correction and VMD. Energy Convers. Manag. 2023, 280, 116804. [Google Scholar] [CrossRef]
Shan, S.; Ding, Z.; Zhang, K.; Wei, H.; Li, C.; Zhao, Q. ACGL-TR: A deep learning model for spatio-temporal short-term irradiance forecast. Energy Convers. Manag. 2023, 284, 116970. [Google Scholar] [CrossRef]
Lee, H.; Lee, B.-T. Confidence-aware deep learning forecasting system for daily solar irradiance. IET Renew. Power Gener. 2019, 13, 1681–1689. [Google Scholar] [CrossRef]
Sushmit, M.M.; Mahbubul, I.M. Forecasting solar irradiance with hybrid classical–quantum models: A comprehensive evaluation of deep learning and quantum-enhanced techniques. Energy Convers. Manag. 2023, 294, 117555. [Google Scholar] [CrossRef]
Gopi, A.; Sharma, P.; Sudhakar, K.; Ngui, W.K.; Kirpichnikova, I.; Cuce, E. Weather Impact on Solar Farm Performance: A Comparative Analysis of Machine Learning Techniques. Sustainability 2023, 15, 439. [Google Scholar] [CrossRef]
Jalali, S.M.J.; Ahmadian, S.; Nakisa, B.; Khodayar, M.; Khosravi, A.; Nahavandi, S.; Islam, S.M.S.; Shafie-Khah, M.; Catalão, J.P. Solar irradiance forecasting using a novel hybrid deep ensemble reinforcement learning algorithm. Sustain. Energy, Grids Networks 2022, 32, 100903. [Google Scholar] [CrossRef]
Chen, S.; Li, C.; Xie, Y.; Li, M. Global and direct solar irradiance estimation using deep learning and selected spectral satellite images. Appl. Energy 2023, 352, 121979. [Google Scholar] [CrossRef]
Chen, Y.; Bai, M.; Zhang, Y.; Liu, J.; Yu, D. Proactively selection of input variables based on information gain factors for deep learning models in short-term solar irradiance forecasting. Energy 2023, 284, 129261. [Google Scholar] [CrossRef]
Wang, H.; Lei, Z.; Zhang, X.; Zhou, B.; Peng, J. A review of deep learning for renewable energy forecasting. Energy Convers. Manag. 2019, 198, 111799. [Google Scholar] [CrossRef]
Wang, Y.; Gu, J.; Yuan, L. Distribution network state estimation based on attention-enhanced recurrent neural network pseudo-measurement modeling. Prot. Control. Mod. Power Syst. 2023, 8, 31. [Google Scholar] [CrossRef]
Shao, Z.; Han, J.; Zhao, W.; Zhou, K.; Yang, S. Hybrid model for short-term wind power forecasting based on singular spectrum analysis and a temporal convolutional attention network with an adaptive receptive field. Energy Convers. Manag. 2022, 269, 116138. [Google Scholar] [CrossRef]
Wang, J.; Han, L.; Zhang, X.; Wang, Y.; Zhang, S. Electrical load forecasting based on variable T-distribution and dual attention mechanism. Energy 2023, 283, 128569. [Google Scholar] [CrossRef]
Zhang, K.; Bai, D.; Li, Y.; Song, K.; Zheng, B.; Yang, F. Robust state-of-charge estimator for lithium-ion batteries enabled by a physics-driven dual-stage attention mechanism. Appl. Energy 2024, 359, 122666. [Google Scholar] [CrossRef]
Liu, Z.; Zhang, L.; Zhang, L.; Ji, T. Wavelet decomposition-bi-directional long-short term memory neural network with prob-sparse attention for non-intrusive load monitoring. CSEE J. Power Energy Syst. 2023, 1–12. [Google Scholar] [CrossRef]
Liu, J.; Fu, Y. Renewable energy forecasting: A self-supervised learning-based transformer variant. Energy 2023, 284, 128730. [Google Scholar] [CrossRef]
Kumar, D.S.; Yagli, G.M.; Kashyap, M.; Srinivasan, D. Solar irradiance resource and forecasting: A comprehensive review. IET Renew. Power Gener. 2020, 14, 1641–1656. [Google Scholar] [CrossRef]
Singla, P.; Duhan, M.; Saroha, S. A point and interval forecasting of solar irradiance using different decomposition based hybrid models. Earth Sci. Inform. 2023, 16, 2223–2240. [Google Scholar] [CrossRef]
Singla, P.; Duhan, M.; Saroha, S. 10—Different normalization techniques as data preprocessing for one step ahead forecasting of solar global horizontal irradiance. In Artificial Intelligence for Renewable Energy Systems; Woodhead Publishing: Sutton, UK, 2022. [Google Scholar]

Figure 1. Block diagram of the deep learning network model.

Figure 2. Attention calculation flow.

Figure 3. DOX algorithm flow diagram.

Figure 4. Comparison of irradiance prediction for 1–3 typical days.

Figure 5. Comparison of irradiance prediction for 4–6 typical days.

Figure 6. Scatter plot between the predicted data and the original data.

Figure 7. Prediction results of four groups of models over a certain time period in spring.

Figure 8. Prediction results of four groups of models over a certain time period in summer.

Figure 9. Prediction results of four groups of models over a certain time period in autumn.

Figure 10. Prediction results of four groups of models over a certain time period in winter.

Figure 11. Box plots comparing the three prediction methods with the raw data.

Table 1. Comparison of the prediction performance parameters of different models for the four seasons.

Season	Evaluation Index	Proposed Method	LSTM	CNN
spring	R²	0.9874	0.9822	0.9685
	RMSE	21.48	26.69	37.16
	MAE	5.76	8.32	12.32
summer	R²	0.9931	0.9917	0.9821
	RMSE	29.38	32.24	46.51
	MAE	9.85	8.62	16.94
autumn	R²	0.9934	0.9943	0.9869
	RMSE	9.13	18.385	27.82
	MAE	4.17	6.99	11.87
winter	R²	0.9861	0.9921	0.9811
	RMSE	7.61	5.29	9.17
	MAE	2.49	1.84	2.73

Table 2. Comparison of the model evaluation performance indexes in different seasons.

Input-len	Evaluation Index	Proposed Method	LSTM	CNN
32	R²	0.9924	0.9822	0.9802
	RMSE	8.57	26.07	13.81
	MAE	5.10	7.56	7.24
96	R²	0.9933	0.9901	0.9833
	RMSE	8.04	26.29	12.69
	MAE	3.91	7.09	3.72
168	R²	0.9902	0.9926	0.9837
	RMSE	8.79	8.35	12.56
	MAE	5.56	4.77	4.84

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhuang, Z.; Wang, H.; Yu, C. Prediction of Short-Term Solar Irradiance Using the ProbSparse Attention Mechanism for a Sustainable Energy Development Strategy. Sustainability 2025, 17, 1075. https://doi.org/10.3390/su17031075

AMA Style

Zhuang Z, Wang H, Yu C. Prediction of Short-Term Solar Irradiance Using the ProbSparse Attention Mechanism for a Sustainable Energy Development Strategy. Sustainability. 2025; 17(3):1075. https://doi.org/10.3390/su17031075

Chicago/Turabian Style

Zhuang, Zhenyuan, Huaizhi Wang, and Cilong Yu. 2025. "Prediction of Short-Term Solar Irradiance Using the ProbSparse Attention Mechanism for a Sustainable Energy Development Strategy" Sustainability 17, no. 3: 1075. https://doi.org/10.3390/su17031075

APA Style

Zhuang, Z., Wang, H., & Yu, C. (2025). Prediction of Short-Term Solar Irradiance Using the ProbSparse Attention Mechanism for a Sustainable Energy Development Strategy. Sustainability, 17(3), 1075. https://doi.org/10.3390/su17031075

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Prediction of Short-Term Solar Irradiance Using the ProbSparse Attention Mechanism for a Sustainable Energy Development Strategy

Abstract

1. Introduction

2. PSA-Based Solar Irradiance Prediction Model

2.1. Embedding Layer

2.2. Feature Extraction Layer

2.2.1. Encoder

2.2.2. Decoder

2.2.3. Regression Layer

3. Feedforward Network and Parameter Iteration Method

3.1. Dingo Optimizer (DOX) Algorithm Optimization

3.2. Parameter Iteration Method

3.2.1. Backpropagation Algorithm

3.2.2. Adam Optimization Algorithm

4. Data Preprocessing

4.1. Feature Selection

4.2. Data Cleaning

4.3. Filtering

5. Performance Metrics

6. Case Study

6.1. Data Description and Simulation Setting

6.2. Irradiance Prediction Based on ProbSparse Attention

6.3. Comparison with the Traditional Time Series Prediction Model

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI