Distributed Regional Photovoltaic Power Prediction Based on Stack Integration Algorithm

Hu, Keyong; Lang, Chunyuan; Fu, Zheyi; Feng, Yang; Sun, Shuifa; Wang, Ben

doi:10.3390/math12162561

Open AccessArticle

Distributed Regional Photovoltaic Power Prediction Based on Stack Integration Algorithm

by

Keyong Hu

^1,2,*

,

Chunyuan Lang

¹,

Zheyi Fu

¹,

Yang Feng

^1,2

,

Shuifa Sun

^1,2 and

Ben Wang

^1,2,*

¹

School of Information Science and Technology, Hangzhou Normal University, Hangzhou 311121, China

²

Mobile Health Management System Engineering Research Center of the Ministry of Education, Hangzhou 311121, China

^*

Authors to whom correspondence should be addressed.

Mathematics 2024, 12(16), 2561; https://doi.org/10.3390/math12162561

Submission received: 22 July 2024 / Revised: 13 August 2024 / Accepted: 18 August 2024 / Published: 19 August 2024

Download

Browse Figures

Versions Notes

Abstract

:

With the continuous increase in the proportion of distributed photovoltaic power stations, the demand for photovoltaic power grid connection is becoming more and more urgent, and the requirements for the accuracy of regional distributed photovoltaic power forecasting are also increasing. A distributed regional photovoltaic power prediction model based on a stacked ensemble algorithm is proposed here. This model first uses a graph attention network (GAT) to learn the structural features and relationships between sub-area photovoltaic power stations, dynamically calculating the attention weights of the photovoltaic power stations to capture the global relationships and importance between stations, and selects representative stations for each sub-area. Subsequently, the CNN-LSTM-multi-head attention parallel multi-channel (CNN-LSTM-MHA (PC)) model is used as the basic model to predict representative stations for sub-areas by integrating the advantages of both the CNN and LSTM models. The predicted results are then used as new features for the input data of the meta-model, which finally predicts the photovoltaic power of the large area. Through comparative experiments at different seasons and time scales, this distributed regional approach reduced the MAE metric by a total of 22.85 kW in spring, 17 kW in summer, 30.26 kW in autumn, and 50.62 kW in winter compared with other models.

Keywords:

GAT; CNN-LSTM-MHA (PC); power; basic model; meta-model

MSC:

68T01

1. Introduction

It is projected that the global population will reach 9.6 billion by 2050, which will further drive the demand for energy worldwide and lead to a significant increase in electricity production around the world. According to statistics from the National Energy Administration of China, the cumulative installed capacity of grid-connected wind and solar power in China surpassed 760 million kilowatts by the end of 2022, continuously breaking the milestones of 800 million kilowatts, 900 million kilowatts, and 1 billion kilowatts and reaching 1.05 billion kilowatts by the end of 2023. This accounted for 36% of the total installed capacity, representing a 6.4 percent increase compared with the previous year. The installed capacity of grid-connected solar power increased from 390 million kilowatts at the end of 2022 to 610 million kilowatts at the end of 2023. In recent years, distributed photovoltaic power generation has entered a phase of rapid development within the industry. Data show that in 2021, the newly installed capacity of distributed photovoltaic power exceeded centralized photovoltaic power for the first time, with an addition of 29.28 million kilowatts, accounting for approximately 55% of the total newly added photovoltaic power generation capacity. In 2022, distributed photovoltaic power development became the main method for wind and solar power development, with a newly installed capacity of 51.11 million kilowatts, accounting for over 58% of the newly added photovoltaic power generation capacity that year. By the end of September 2023, the cumulative installed capacity of distributed photovoltaic power for households in China exceeded 100 million kilowatts, reaching 105 million kilowatts. With the continuous increase in the proportion of distributed photovoltaic installations, the presence of random fluctuations and non-stationarity caused by factors such as complex weather is becoming more prominent, posing a fundamental threat to the security of the power grid [1]. It also hinders the real-time data collection, perception, and processing capabilities necessary to achieve the goal of observing, measuring, adjusting, and controlling massive resources, as well as enhancing the coordinated interaction capabilities among power sources, energy storage, loads, and the grid. According to statistics from the National Energy Administration of China, solar power generation reached 325.9 billion kilowatt-hours in 2023, with year-on-year growth of 25.1% and a solar power utilization rate of 98.0%. However, there was estimated waste of approximately 6.52 billion kilowatt-hours of solar energy [2]. Therefore, improving the accuracy of regional photovoltaic power forecasting is of the utmost importance.

Currently, there are primarily three methods for photovoltaic power forecasting: physical methods, statistical methods, and artificial intelligence methods. Physical methods involve complex modeling processes which require the integration of meteorological and engineering expertise [3], such as astronomical models, meteorological models, and radiation models. While physical methods theoretically provide a deep understanding of photovoltaic power and high accuracy, their complexity, data requirements, and limitations may pose challenges in practical applications. Statistical methods include fuzzy theory [4], Markov chains [5], and regression analysis [6,7]. Since photovoltaic power is influenced by multiple input variables such as weather conditions, light intensity, and temperature, there are complex interactions and nonlinear relationships among these factors, which statistical methods often struggle to accurately capture and model. With the rapid development of artificial intelligence, deep learning methods have been widely applied in the field of photovoltaic power prediction. Ahn et al. [8] used an LSTM network based on an RNN for short-term photovoltaic power forecasting, addressing issues such as gradient vanishing and exploding, which exist in traditional RNNs. However, as the network becomes deeper, the performance of LSTM may decline. Agga et al. utilized convolutional neural networks to capture spatial features and local details in the data. However, due to the limited perception capability of the convolutional kernel for global information, even with larger convolutional kernels, their capture range was limited, resulting in suboptimal peak prediction. To address the performance decline of LSTM with increasing network depth and the weak global information perception capability of convolutional networks, the CNN-LSTM model was proposed [9]. The emergence of this hybrid model has enabled people to address the existing issues of single models, leading to the emergence of numerous hybrid methods. For instance, Qi et al. [10] employed a CNN-LSTM model to forecast short-term loads in integrated energy systems, demonstrating that this model outperforms the CNN and LSTM models in terms of prediction accuracy. Niu et al. [11] utilized attention mechanisms to optimize a CNN-BiGRU model for short-term multi-energy load forecasting, resulting in an average MAPE improvement of 66.09% compared with the single LSTM model. Gao et al. [12] used a CNN-BiLSTM model to predict the remaining lifespans of lithium-ion batteries in electric vehicles, showcasing through comparisons with other classical models that hybrid models possess higher generalization capabilities and prediction accuracy. The above methods have been proven to improve the accuracy of individual power predictions in power plants. However, the optimization problem of deep learning models is often non-convex, meaning that there are multiple local optima and saddle points. As the number of hybrid models increases, the number of local optima and saddle points also increases, making it easier for the model to become trapped in local optima and saddle points and thus fail to achieve the best prediction accuracy. To address these issues, researchers have proposed using optimization algorithms to find the optimal solutions in models, such as LCASO-BP [13], GA-LSTM [14], ED-LSTM [15], and PSO-VMDFE-WHO-CNN [16]. These methods have demonstrated the feasibility of using swarm intelligence optimization algorithms to find the optimal hyperparameters in models, thereby improving the accuracy of individual power plant predictions. However, most current photovoltaic power forecasting focuses on individual power plants, and there is limited research on forecasting regional photovoltaic power plants.

There are three main methods for regional photovoltaic power forecasting: cumulative methods, extrapolation methods, and statistical scale methods [17]. (1) In cumulative methods, the power of all individual photovoltaic power stations in a whole large area is predicted and then directly added to find the regional prediction results. This method is quite simple to implement, but the problem is also rather obvious. The number of photovoltaic power stations in a large area is extremely large, and if there is a large error one in of the power stations in the forecast process, then in the accumulation process, the error will gradually increase, and each power station will be distributed in different areas, which also leads to different characteristics and the need to build characteristics for each power station, resulting in much work. (2) In extrapolation methods, the region’s distributed photovoltaic power plants are divided into several sub-regions. Representative power values are selected to predict the output of each sub-region, and then the output values of each sub-region are summed to obtain the output value of the entire region. (3) Statistical scale methods involve dividing a large region into multiple sub-regions. Within each sub-region, representative power plants are selected, and their power outputs are individually predicted. Finally, using mathematical statistics, the weight coefficients of each representative power plant are calculated based on the proportion of the baseline power plant’s generation in each sub-region’s photovoltaic power plant group. The regional photovoltaic power plant forecast is then calculated through weighted aggregation.

The aforementioned methods for sub-region division include nearest neighbor propagation clustering [18], grid-partitioned sampling [19], k-means clustering [2], and hierarchical clustering [20]. After sub-region division, representative photovoltaic power plants are selected from each sub-region. Li et al. [2] calculated the correlation coefficients (such as the correlation coefficients, Pearson correlation coefficient, and Spearman’s rank correlation coefficient) between the power generation of each photovoltaic power plant and the total power generation of the sub-region. The photovoltaic power plant with the highest correlation coefficient was defined as the baseline power plant. Although the aforementioned methods are based on the statistical scale method, they often only consider the correlation between the representative power plant and the total regional power, neglecting the spatial correlation between the representative power plant and other power plants within the sub-region. This limitation restricts the utilization of data from other power plants and limits the improvement in prediction accuracy.

For example, within a sub-region, each power plant may have different capacities and installation scales, or the geographic locations and environmental conditions of the power plants may result in uneven resource distribution. In such cases, power plants with larger capacities or those situated in resource-rich areas would have a significant advantage in correlation calculations. However, representative power plants selected using this approach may not effectively represent the entire sub-region. This is because this method overlooks the spatial correlation between the representative power plant and other power plants within the sub-region. Each power plant may have different locations and relationships with other plants in space, and these relationships play an important role in the power characteristics of the entire sub-region. Methods which solely consider the overall regional power correlation may fail to capture this spatial correlation, leading to representative power plants which do not fully reflect the characteristics of the entire sub-region.

To address this issue, Simeunović et al. [21] proposed the application of graph convolutional networks (GCNs) for multi-site photovoltaic power prediction and demonstrated its effectiveness. Zhang et al. [22] constructed a GCN-LSTM prediction model to improve the accuracy of ultra-short-term photovoltaic power forecasting. However, the aforementioned methods have certain limitations in information aggregation when using GCNs. A GCN employs a fixed aggregation approach by linearly combining the features of the input nodes and their neighboring nodes. Furthermore, a GCN only focuses on the immediate neighbors of a node and ignores distant nodes. This restriction can limit the node representation to the information within its local neighborhood, failing to fully utilize the global information of the entire graph. To overcome this limitation, Velickovic et al. [23] introduced graph attention networks (GATs) and incorporated an attention mechanism to model the relationships between nodes. In contrast to GCNs, GATs dynamically aggregate information from neighboring nodes by learning the weights between each node and its neighbors. This adaptive aggregation allows each node to determine the extent of interaction with its neighbors, capturing the complex relationships between nodes. Additionally, GATs support multi-head attention mechanisms, where multiple attention heads are used in each layer for feature aggregation. Each attention head can learn different node relationships, providing richer graph structure information. By utilizing multiple attention heads, a GAT can simultaneously consider multiple node relationships, enhancing the model’s expressive power and generalization ability.

In summary, the existing methods for regional photovoltaic (PV) power prediction have the following limitations:

Compared with the summation method and extrapolation method, statistical scaling methods are more suitable for regional PV power prediction. However, due to various factors such as different environmental conditions affecting different PV power plants, selecting a reference PV power plant may not fully represent the characteristics and variations of all PV power plants within the sub-region.
The operation and maintenance of PV power plants can lead to changes in their power generation capabilities. The proportion of power generated by the reference PV power plant may vary over time, which can affect the estimation of the reference PV power plant’s proportion.
Extrapolating the entire region’s PV power based on proportions alone does not take into account the spatiotemporal characteristics and synergistic effects among sub-regions, resulting in less accurate predictions.

To address these existing issues, we propose a distributed regional photovoltaic (PV) power prediction method based on the stacked ensemble algorithm [24]. The main contributions are as follows:

In the process of selecting representative power plants for each sub-region, we utilize a graph attention network (GAT). The GAT models all the power plants within each sub-region by leveraging its characteristics, representing the connections and interactions among the power plants using a graph structure. By learning the weights and attention distributions between power plants, we integrate and fuse the features and relationships among different sub-regions, ultimately selecting the most representative power plants.
We employ a CNN-LSTM-multi-head attention parallel multi-channel model as the base model for the stacked ensemble algorithm. This approach fully utilizes the strengths of the CNN and LSTM models, enhancing the model’s feature representation capabilities and sequence data modeling abilities. Additionally, we incorporate a multi-head attention mechanism for comprehensive feature weighting and fusion, considering the different feature representations and sequence modeling capabilities. This results in more comprehensive and accurate model outputs.
By using the outputs of the base model as inputs to the meta-model, we incorporate the features and spatiotemporal characteristics of each sub-region into the consideration of the meta-model. The base model can perform individual feature extraction and modeling for each sub-region, capturing the spatiotemporal characteristics within the sub-region. The meta-model then integrates and consolidates the outputs of different sub-regions, better reflecting the characteristics of the entire region and predicting PV power generation.

The organizational structure of this paper is as follows. The first chapter describes the existing methods and problems in the field of regional photovoltaic power generation forecasting and puts forward solutions to the above problems. Section 2 introduces the framework of regional forecasting in this paper. Section 3 explains the selection of representative power stations. Section 4 introduces the stack integration algorithm and the basic model and meta-model selected in the algorithm. In Section 5, the feasibility of the proposed method is proven by comparing the model with other excellent models. Section 6 gives a final summary of this paper.

2. Regional Photovoltaic Power Generation Forecasting Framework

This article proposes a multi-channel distributed regional photovoltaic (PV) power prediction method based on the stacked ensemble algorithm. This method integrates the different characteristics of all power plants within each sub-region and the spatiotemporal features among different sub-regions, enabling the prediction of regional PV power generation under different seasons and time scales. The overall flowchart of the method is shown in Figure 1.

The general flow chart is as follows:

(1): Preprocess the raw data of each power station, including removing outliers, filling missing values, and analyzing feature correlations. After data processing, divide the data into four seasons: spring, summer, autumn, and winter. Normalize the data after division and split them into training, testing, and validation sets at an 8:1:1 ratio.
(2): Divide the larger area into three sub-regions: A, B, and C. After the processing in step 1, input the data of all power stations in region A into a GAT. Select the most representative power station for region A based on the results from the GAT. Apply the same method to select representative power stations for regions B and C. If a sub-region contains only one power station, then that station becomes the representative station for that region.
(3): Input the data of representative power stations from regions A, B, and C for different seasons and time scales into the base model (CNN-LSTM-MHA (PC)). Utilize this model to predict the power generation of regions A, B, and C and obtain the results.
(4): Use the output from step 3 as new features, and add them to the data of the larger area. Input the new data into a meta-model (GRU) to predict the power generation of the larger area.

3. Selecting Sub-Regions for Power Stations

When selecting sub-regions to represent power stations, first of all, power stations which can be classified into similar weather conditions by region are classified into three sub-regions (A, B and C), as shown in Figure 2.

Because power stations in the same area are often affected by similar climatic and environmental conditions. This partitioning helps reduce the complexity of the data and the impact of noise. After the zone is divided, representative power stations are selected from many power stations in the subzone. In the sub-region, there is a temporal and spatial correlation between power stations; that is, the status of a power station may be affected by the surrounding power stations. Most importantly, in the selection of representative power stations, it is necessary to consider not only the relationship between each power station and its neighboring power stations but also the relationship between all power stations in the whole region.

Graph Attention Networks

This article utilizes a graph attention network (GAT) to select representative power plants within each sub-region. The GAT is a graph neural network model designed to capture both the spatial and temporal information among related power plants within the sub-region by introducing attention mechanisms between nodes. Unlike a graph convolutional network (GCN), which performs a simple average aggregation of neighboring nodes, the GAT calculates the attention weights for each node with its neighbors and aggregates them accordingly. This allows the GAT to adaptively model the contributions of different nodes to their neighbors, enabling more flexible capturing of the local structure within the graph. Since each power plant may have varying importance and contributions, this article incorporates a multi-head attention mechanism in the GAT. This mechanism learns different attention weights from multiple perspectives in parallel, based on the relationships and features of the nodes. Figure 3 illustrates this process.

From the diagram, it is evident that each power plant in the sub-region is depicted as a node (P). The various colored arrows represent distinct attention sets. By continuously learning and incorporating the spatiotemporal features of all other power plants within the region, the attention sets are merged and averaged to yield the final output for each power plant. This method enables the model to consider the impact of each power plant from a global perspective, better capturing the interactions among power plants within the region and improving prediction accuracy.

Through iterative learning and prediction, the power plant with the most accurate predictions is chosen as the representative for the sub-region. This ensures that the selected representative power plant effectively represents the photovoltaic power generation within the entire sub-region, providing a reliable foundation for subsequent analysis and decision making. In cases where the sub-region contains only one power plant, it automatically becomes the representative for the sub-region, as shown in Figure 4.

In Figure 4, through the GAT, the representative power station in this sub-region is finally selected: P7. The calculation step takes the features of a group of nodes as the input. For example,

h = \{h_{1}, h_{2}, \dots h_{s}\}

,

h_{i} \in R^{F}

, where F is the dimension of the features owned by each node, and i = 1, 2, …, s, where s is the number of nodes. The correlation attention between nodes

i

and

j

is described as follows:

e_{i j} = a (W h_{i}, W h_{j})

(1)

α_{i j} = {softmax}_{j} (e_{i j}) = \frac{\exp (e_{i j})}{\sum_{k \in N_{i}} \exp (e_{i k})}

(2)

α_{i j} = \frac{\exp (LeakyReLU (a^{T} [W h_{i} | | W h_{j}]))}{\sum_{k \in N_{i}} \exp (LeakyReLU (a^{T} [W h_{i} | | W h_{j}]))}

(3)

h_{i}^{'} = σ (\frac{1}{K} \sum_{k = 1}^{K} \sum_{j \in N_{i}} α_{i j}^{k} W^{k} h_{j})

(4)

In Equation (1),

W

is the weight matrix, and

a

is the single-layer feedforward neural network. The weight matrix

W

is spliced with the result of the multiplication of node

i

’s features and node

j

’s features. After the spliced result is multiplied by

a

, the high-dimensional features are mapped to a real number, namely the attention coefficient

e_{i j}

. In Equations (2) and (3),

N

is the number of other nodes in node domain

i

, and to perform a normalization operation on

e_{i j}

, the LeakyReLU activation function is adopted. Finally, combined with the multi-head attention mechanism, as shown in Equation (4), the new node features are generated at last, where

h^{'} = \{h_{1}^{'}, h_{2}^{'}, \dots h_{s}^{'}\}

,

h_{i}^{'} \in R^{F}

.

4. Stack Integration Algorithm

The accuracy of PV power prediction in large-scale regions is of the utmost importance for energy planning and operational management. To ensure a reliable energy supply and effective operational decision making, higher-precision models are required for predicting PV power in large-scale regions. Building upon the statistical scale method, this paper divides a large-scale region into multiple sub-regions. To better capture the shared features and similarity relationships among the sub-regions, a combination model based on stacked ensemble algorithms is employed for predicting PV power in the large-scale region.

The stacked ensemble algorithm improves overall prediction performance by combining multiple base models. Its main concept involves using the predictions of the base models as input features and utilizing another meta-model for the final prediction. This algorithm effectively integrates the strengths of multiple models. By incorporating predictions from multiple base models, it can better adapt to new and unseen data. This generalization ability allows the model to demonstrate higher prediction capabilities and increased reliability when encountering various PV power generation scenarios in different sub-regions. Furthermore, the algorithm can be easily expanded to incorporate additional base models and adjusted and improved as needed, providing enhanced scalability and flexibility.

4.1. Basic Model

In the process of photovoltaic power generation, many influential characteristics are often involved, such as weather data (sunshine time, temperature, humidity, etc.), geographical location information, and photovoltaic module performance parameters. These data can have highly nonlinear relationships, and there can be temporal and spatial correlations and interaction effects, which also make it difficult for simple machine learning models to capture complex patterns and associations in the data. In this paper, the CNN-LSTM multi-channel parallel (CNN-LSTM (PC)) model is selected in the basic model training stage, which can maximize the advantages of combining the CNN and LSTM approaches, as shown in Figure 5.

Firstly, the parallel training of the CNN and LSTM models allows for comprehensive utilization of both the temporal and spatial features in the data. The CNN is capable of extracting spatial features from the data, while the LSTM model, with its memory cells and gating mechanisms, selectively remembers and forgets past information. This enables the model to better capture long-term patterns and trends in the power data of sub-regions. Additionally, there may exist spatial correlations in the power data of the region, meaning that adjacent sub-regions may have similar or related power values. Through its recursive structure, LSTM can simultaneously handle both time series and spatial correlations, thereby effectively leveraging the interdependence between sub-regions for prediction. By training these two models in parallel, we can consider the spatiotemporal information within the power plant data simultaneously, leading to more comprehensive modeling and prediction of power generation.

Secondly, through the multi-head attention mechanism, we can merge the outputs of the CNN and LSTM models. The multi-head attention mechanism allows for the weighting and fusion of different feature subspaces, enabling the model to focus more on important features and spatiotemporal relationships. This mechanism enhances the model’s perception of critical features in the power plant data, thus improving the accuracy and robustness of predictions.

Lastly, the final prediction is obtained by mapping the output of the multi-head attention mechanism through fully connected layers. The fully connected layers enable the combination and transformation of features, converting the high-level features extracted by the multi-head attention mechanism into the final prediction output. This structure possesses strong expressive power, being capable of adapting to complex patterns in power plant data and producing accurate prediction results. The structure and related parameters of the model are shown in Table A1 in Appendix A.

4.1.1. Convolutional Neural Networks

Convolutional neural networks (CNNs) have achieved significant success in image processing, and their feature extraction capability plays a crucial role in time series data as well. Time series data often contain complex patterns and structures, and traditional manual feature engineering is challenging for extracting such features. Through the hierarchical stacking of convolutional operations, a CNN can efficiently capture the spatial features among the data.

Moreover, a specific moment in time series data is usually correlated with its neighboring moments. Convolutional operations can capture local dependencies when processing time series data. By defining appropriate kernel sizes, the CNN can effectively capture local patterns and their evolution in the time series data. Additionally, by using different kernel sizes or multiple layers of convolution, the CNN can achieve a multi-scale representation of the time series data. Smaller kernels can capture detailed features, while larger kernels can capture more macroscopic trend features. Multiple layers of convolution can gradually extract abstract features from the time series data, enabling modeling of different time scales. The CNN’s implementation formulae are as follows:

Y [i] = \sum_{j = 0}^{k - 1} X [i + j] \cdot W [j]

(5)

Y [i] = \max_{j = 0}^{p - 1} X [i \cdot s + j]

(6)

where

Y

is the output sequence of the convolution operation,

i

is the index of the output sequence,

k

is the size of the convolution kernel,

p

is the size of the pooling window, and

s

is the step length of the pooling window.

4.1.2. Long Short-Term Memory

Long short-term memory (LSTM) is a variant of recurrent neural networks (RNNs) which has achieved significant success in handling time series data. Time series data often exhibit long-term dependencies and temporal relationships, and traditional RNNs face challenges in dealing with long-term dependencies due to the vanishing or exploding gradient problem. LSTM effectively addresses these issues by introducing gate mechanisms.

The main idea of LSTM is to store and update information through a memory unit called a “cell”. The cell consists of a forget gate, an input gate, and an output gate, each with learnable weights which determine whether to pass or update information. The forget gate decides whether previously stored memory should be forgotten, the input gate determines how new information is integrated into the memory, and the output gate determines how much of the output memory is passed to the next time step.

The key aspect of LSTM lies in its ability to update and retain long-term memory effectively. Through control of the forget gate and input gate, LSTM can selectively forget or store information, enabling more accurate handling of long-term dependencies. This allows LSTM to capture important patterns and structures in time series data without being limited by vanishing or exploding gradients.

Similar to CNNs, LSTM can also extract more abstract features through the stacking of multiple layers. Each LSTM layer can capture different levels of time scales, ranging from lower-level detailed features to higher-level abstract features. This multi-layer structure facilitates a deeper understanding and modeling of time series data by LSTM. The formulae for each state at time

t

are as follows:

i_{t} = σ (W_{i} x_{t} + U_{i} h_{t - 1} + b_{i})

(7)

f_{t} = σ (W_{f} x_{t} + U_{f} h_{t - 1} + b_{f})

(8)

o_{t} = σ (W_{o} x_{t} + U_{o} h_{t - 1} + b_{o})

(9)

{\tilde{c}}_{t} = \tanh (W_{c} x_{t} + U_{c} h_{t - 1})

(10)

c_{t} = f_{t} ⊙ c_{t - 1} + i_{t} ⊙ {\tilde{c}}_{t}

(11)

h_{t} = o_{t} ⊙ tahn (c_{t})

(12)

where

W_{i}

,

W_{f}

,

W_{o}

, and

W_{c}

are the weight matrix of the input gate, forget gate, output gate, and memory unit, respectively,

U_{i}

,

U_{f}

,

U_{o}

, and

U_{c}

are the weight matrix of the hidden layer, and

b_{i}

,

b_{f}

, and

b_{o}

are the bias values. A memory unit

c_{t}

is updated through the forget gate and the input gate. The forget gate determines how much information from a memory unit

c_{t - 1}

at the previous time is retained, and the input gate determines how much new information

{\tilde{c}}_{t}

is added to the memory unit state.

4.1.3. Multi-Head Attention Mechanism

In the multi-head attention mechanism, each attention head has an independent weight allocation mechanism which determines how much attention each head pays to the input. Each attention head generates a weight coefficient vector which is used to weight the sum of the input and obtain a representation of that head. In this way, multiple attention heads can learn different attention patterns in different feature subspaces, thus providing diversified information expression. The formulae for calculating the multi-head attention mechanism are as follows:

MultiHead (Q, K, V) = Concat ({head}_{1}, \dots {head}_{h}) W^{o}

(13)

{head}_{i} = Attention (Q W_{i}^{Q}, K W_{i}^{K}, V W_{i}^{V})

(14)

Attention (K, Q, V) = softmax (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(15)

In this formula,

Q, K, V

represent the query vector, key vector, and value vector, respectively,

h

represents the number of headers,

{head}_{i}

represents the output of the head

i

, and

W^{o}

represents the output transformation matrix.

W_{i}^{Q}

,

W_{i}^{k}

, and

W_{i}^{v}

represent the query, key, and value matrix of the head

i

, respectively,

d_{k}

is the dimension of the key vector, and softmax is the similarity normalization.

4.2. Meta-Model

In the forecasting of photovoltaic power generation in large areas, we will pay more attention to the forecast results of the overall power generation. Because the spatial correlation between power stations in a large region may not be obvious, and short-term changes and fluctuations sometimes have a greater impact on the overall forecast results, the GRU is used as the meta-model in this paper.

Gated Recurrent Unit

The gated recurrent unit (GRU), compared with LSTM, has fewer gate units, allowing it to more effectively capture rapid changes and short-term patterns in large-area power data. Additionally, large areas may exhibit significant differences due to geographical location, weather, and other factors. The GRU model can better adapt to the variability between different regions and quickly adjust to changes in different areas. This enables it to accurately capture the characteristics and changing patterns of different regions when predicting power in a large area. As a meta-model, the GRU can synthesize predictions from sub-regions and further forecast the power for an entire large area. It can weigh and adjust the predictions from sub-regions, thereby improving the accuracy for the entire large area.

In this paper, a GRU is used as the meta-model of the stack integration algorithm, and the GRU implementation formulae are as follows:

z_{t} = σ (W_{z} \cdot (h_{t - 1}, x_{t}) + b_{z})

(16)

r_{t} = σ (W_{r} \cdot (h_{t}, x_{t}) + b_{r})

(17)

{\tilde{h}}_{t} = \tanh ((r_{t} ⊙ h_{t - 1}) W_{h} + x_{t} W_{h} + b_{g})

(18)

h = (1 - z_{t}) ⊙ h_{t - 1} + z_{t} ⊙ {\tilde{h}}_{t}

(19)

where

x_{t}

is the input of the current time step,

h_{t}

is the hidden state of the current time step,

z_{t}

is used to control the weight between the hidden state of the previous moment and the candidate hidden state, and

r_{t}

is used to control the influence of the previous hidden state on the candidate state, while

{\tilde{h}}_{t}

is the candidate hidden state and

W_{z}

,

W_{r}

, and

W_{h}

are weight parameters. See Appendix A for details.

4.3. Model Evaluation

In this paper, the root mean square error (

R M S E

), mean absolute error (

M A E

), and R squared value (

R^{2}

) were used as evaluation indices to evaluate the prediction results of the proposed model and the comparison test model in different seasons and time scales. The specific formulae for these variables are as follows:

X_{R M S E} = \sqrt{\frac{\sum_{t = 1}^{N} {(y_{t} - y_{t}^{'})}^{2}}{N}}

(20)

X_{M A E} = \frac{\sum_{t = 1}^{N} |y_{t} - y_{t}^{'}|}{N}

(21)

R^{2} = 1 - {\frac{\sum_{t = 1}^{N} (y_{t}^{'} - y_{t})}{{\sum_{t = 1}^{N} (\bar{y} - y_{t})}^{2}}}^{2}

(22)

In the formulae,

y_{t}

and

y_{t}^{'}

are the true value and predicted value at time

t

, respectively,

\bar{y}

is the average value of the true value, and

N

is the total number of test samples.

5. Experimental Research

5.1. Data Description

The experimental hardware set-up for this study included a 2.5 GHz Intel (R) Core (TM) i7-11700H CPU with 32.00 GB of memory, and implementation was performed using the TensorFlow framework and Python language. The experimental data for this study were provided by the Desert Knowledge Australia Solar Centre (DKASC) and originated from the Yulara Solar System in the Ayers Rock region of Australia. Installed in 2014, the Yulara Solar System is an operating 1.8 MW solar photovoltaic plant which was developed with the support of the Australian Renewable Energy Agency (ARENA). Comprising five sub-systems distributed across the local township of Yulara, it sits beside Central Australia’s renowned landmark Uluru (Ayers Rock) in addition to generating electricity for the local grid.

There were a total of 8 meteorological forecast-related fields: Wind_Speed, Temperature, Global_Horizontal_Radiation, Wind_Direction, Max_Wind_Speed, Air_Pressure, Pyranometer_1, and Pyranometer_2. The details are shown in Table 1. The dataset can be publicly downloaded from the website corresponding to [25].

Each dataset contained environmental factors and photovoltaic power output data collected at 5 min intervals, resulting in 288 data points per day. Due to human error, communication failures, or other issues during the data collection process, there may have been outliers and missing data. In this study, the 3 sigma criterion was applied to identify and remove outliers. This criterion defines data points which exceed three times the standard deviation as outliers. For missing data points, the Hermite interpolation method was used to estimate reasonable values based on the values and derivative information of existing data points.

As different seasons can have varying impacts on photovoltaic power generation, considering the influence of factors such as sunlight and temperature, the relationship between photovoltaic power generation and meteorological factors may exhibit different patterns at different time scales. Therefore, the dataset was divided into seasons: spring, summer, autumn, and winter. Furthermore, for each season, the data were divided into different time intervals: 1 h, 3 h, and 5 h. This division allowed for better capturing of seasonal patterns and trends between photovoltaic power generation and meteorological factors. Additionally, by modeling multiple time scales, the dynamic characteristics of the data could be more comprehensively captured.

5.2. Correlation Analysis

In photovoltaic power prediction tasks, there are numerous features which can potentially affect the power output. Selecting appropriate input features is crucial for establishing an accurate prediction model. Due to the presence of a large number of potential influencing factors, feature selection helps to identify features strongly correlated with the power output, reducing the interference of redundant information and noise. Choosing highly correlated features as inputs enables better capture of the factors influencing power and improves the accuracy and interpretability of the prediction model.

This study employed the maximal information coefficient (

M I C

) correlation and the Pearson correlation coefficient for feature selection and partitioning.

M I C

correlation, as a non-parametric measure of correlation, can identify associations between variables of any type, including nonlinear relationships. This capability allows it to discover complex correlations hidden within photovoltaic power data, assisting in finding features strongly correlated with power output without being limited by assumptions about feature distributions. The formula for calculating

M I C

correlation is as follows:

M I C_{(x, y)} = \max_{e * f < D} \frac{I (x, y)}{\min (e, f)}

(23)

where

I (x, y)

is the amount of mutual information of

x

and

y

,

e

and

f

are the number of grid units, and

D

is usually set to the total number of samples to the power of 0.6.

The Pearson correlation coefficient is a common linear correlation measurement method which is suitable for measuring the strength of a linear relationship. By using the Pearson correlation coefficient, features which had a high linear correlation with the power output could be identified. The relevant calculation formula is as follows:

ρ_{x, y} = \frac{cov (X, Y)}{σ_{X} σ_{Y}}

(24)

where

cov (X, Y)

is the covariance of

X

and

Y

and

σ_{X} σ_{Y}

is the standard deviation of

X

and

Y

.

Correlation analysis was carried out between the historical data of actual photovoltaic power generation, relevant meteorological data, and the characteristics of the photovoltaic panels themselves. The correlation analysis results are shown in Figure 6.

The results show that Wind_Speed (wind speed), Temperature (temperature), Radiation (radiation), Max_Wind_Speed (maximum wind speed), Pyranometer_1 (probe 1’s temperature), and Pyranometer_2 (probe 2’s temperature) had great influence on photovoltaic power generation. Therefore, these influencing factors were selected to be input into the model as features.

5.3. Experimental Results

5.3.1. The Sub-Regions Represent Comparison Experiments for Power Station Selection

In Section 3, the selection method for a sub-regional representative power station was introduced. The power station with the best prediction result based on a GAT was the representative power station in the region. Sub-region A, for example, had five power stations: DG, SS, SD-1A, SD-2A, and SD-3A. By constructing a GAT, the power of five power stations was predicted and evaluated using the RMSE and MAE indices. Table 2, Table 3, Table 4 and Table 5 list the results.

Based on the data in Table 2, Table 3, Table 4 and Table 5 significant observations can be made regarding the prediction accuracy. The RMSE and MAE values for station SD-1A were noticeably smaller compared with the other power stations in the sub-region. This indicates that station SD-1A exhibited exceptional accuracy in its predictions. Taking into account the aforementioned metrics, we can conclude that station SD-1A achieved the best prediction results by iteratively learning and integrating the spatiotemporal features from all other power stations in the region. Therefore, station SD-1A can be considered representative of the entire sub-region. Furthermore, tests conducted across different seasons demonstrate that station SD-1A, as a representative power station, exhibited superior generalization capabilities, performing well in predictions across diverse seasons. This further confirms the superiority of station SD-1A in capturing seasonal variations and its predictive abilities. Based on these results, we selected SD-1A as the representative power station for region A. The same validation process was carried out for regions B and C, identifying representative power stations for their respective sub-regions.

5.3.2. Sub-Region Basic Model Comparison Test

In this study, a model based on the stacked ensemble algorithm was employed to predict the distributed photovoltaics (PVs) in regional areas. The basic model was used to predict the power generation in the sub-regions, and its output served as the input for the meta-model, which was used to predict power generation in the larger region. The accuracy of the basic model’s predictions directly impacts the quality of the input data for the meta-model. If the basic model’s predictions exhibit significant errors, then these errors will propagate to the meta-model, thereby affecting the prediction results for the larger region’s power generation. Additionally, if the selected basic model can adequately learn and capture the spatiotemporal features of the sub-regions, then the meta-model will benefit from these valuable feature representations. Conversely, if the selected basic model has weak feature learning capabilities, then it may not provide sufficient information for predicting the larger region, resulting in decreased prediction accuracy. Liu et al. [26] demonstrated the feasibility and effectiveness of parallel network structures in the PV domain by utilizing multimodal decomposition and combining parallel bidirectional long short-term memory (BiLSTM) and a convolutional neural network (CNN). Compared with traditional single-channel networks, parallel network structures exhibit stronger generalization and robustness.

In this paper, model 2 (CNN-LSTM-MHA (PC)) was used as the basic model and compared with model 1 (CNN-GRU-MHA (PC)) and model 3 (CNN-BiLSTM-MHA (PC)) in different seasons and different time steps. Taking region A as an example, the model evaluation results are shown in Figure 7.

In this experiment, dividing the data according to different seasons helped to better capture the seasonal patterns and trends between photovoltaic power generation and meteorological factors. Different seasons’ meteorological factors, such as sunlight and temperature, have varying impacts on photovoltaic power generation. Additionally, predictions were made for 1 h, 3 h, and 5 h ahead. By forecasting the photovoltaic power generation at different time points, the model’s robustness and practical benefits can be better evaluated. From Figure 7a, it can be seen that model 2’s MAE metric decreased by an average of 1.77 kW and 1.1 kW compared with model 1 and model 3 in the 1 h, 3 h, and 5 h predictions. Similarly, in Figure 7b, model 2 showed an average reduction of 2.63 kW and 0.22 kW compared with the other two models. The results indicate that by leveraging the strengths of the CNN model in capturing the local features of the data and LSTM in capturing the long-term dependencies, the CNN-LSTM-MHA (PC) model constructed in this study and trained in parallel can maximize model performance. When LSTM was replaced by a GRU and BiLSTM under the same model structure, although good prediction results were achieved, there were still some issues. For instance, the GRU struggled to capture complex data features due to fewer gating units, and BiLSTM required a larger number of parameters, making it prone to overfitting. The model evaluation metrics for sub-region A are shown in Table 6, Table 7 and Table 8 while detailed data for sub-regions B and C can be found in Table A2, Table A3, Table A4, Table A5, Table A6 and Table A7 of Appendix A.

5.3.3. Large-Area Meta-Model Comparison Test

Large regions typically encompass a wide range of geographical and climatic conditions, making it challenging to select suitable data features for large-area prediction. Additionally, there may be strong spatial correlations among data from different sub-regions. The stacked ensemble algorithm leverages the outputs of base models as new features for the meta-model, addressing the inadequacies of large-area data features and enhancing spatial correlations among different sub-regions. In this study, a GRU was employed as the meta-model to forecast power generation in a large region. To further validate the feasibility of the proposed model, comparative experiments were conducted with classical models such as CNN-LSTM-MHA (SC), BiLSTM-CNN (SC), and the traditional extrapolation method. The prediction data for the sub-regions were all based on the results predicted by the CNN-LSTM-MHA (PC) model. The extrapolation method involved summing the predictions from individual sub-regions. The comparison results are shown in Figure 8, where the data were randomly selected from one day of the final results.

From the above comparison of the prediction results, it can be observed that both the proposed method in this study and other classical models and traditional methods are capable of capturing the trends in photovoltaic power generation. This indicates that under normal circumstances, they can provide a certain level of prediction accuracy. However, during extreme weather conditions, there was a significant abnormal fluctuation in photovoltaic power generation. In comparison with other methods, the method proposed in this study can more accurately capture the detailed features of the data and demonstrate better fitting at the turning points of the curves. This suggests that this method can more precisely predict the changes in photovoltaic power generation at different time points and identify the inflection points of the power curve. The specific evaluation metrics are shown in Table 9, Table 10 and Table 11.

Based on the data table, it is evident that regardless of the season or time scale, using a GRU as the meta-model yielded significantly better results compared with other classical models. Across the four seasons of spring, summer, autumn, and winter, compared with BiLSTM-CNN (SC), CNN-LSTM-MHA (SC), and the extrapolation method, our proposed model showed average reductions in the MAE at different time scales, namely 5.49 kW, 8.28 kW, and 0.95 kW; 4.97 kW, 5.99 kW, and 0.87 kW; 7.9 kW, 9.73 kW, and 2.56 kW; and 14.07 kW, 17.68 kW, and 6.19 kW, respectively. Therefore, in scenarios where there are fewer features in a large area, the simplicity of the GRU model’s gating mechanism allows it to swiftly and effectively capture crucial information from the data, leading to superior outcomes.

Although the extrapolation method slightly outperformed the other two models in certain metrics, it still fell short compared with our proposed model. The strength of the extrapolation method lies in its simplicity and ease of implementation, but its performance heavily relies on the accuracy of the base model and reasonable sub-region divisions. Insufficient accuracy in the base model or overly detailed sub-region divisions can lead to error accumulation and decreased prediction accuracy.

Obtaining features from large-area data is challenging, resulting in a lower feature count. In such scenarios, models like BiLSTM-CNN (SC) and CNN-LSTM-MHA (SC), due to their complex structures, are prone to getting stuck in local optima during training, leading to significantly increased training times.

This indirectly underscores the importance of choosing the right models when dealing with diverse data. For sub-regions, where data complexity and influential features are high, a single model might not adequately capture essential information, necessitating the utilization of different models’ strengths by adjusting network structures for training and prediction. In contrast, for large areas with fewer features, complex model structures may not necessarily improve predictive accuracy; instead, they may increase the training time and risk of overfitting.

6. Conclusions

To address the issues of inaccurate selection of representative power stations for sub-regions, insufficient consideration of spatial correlations within and between sub-regions, and challenges in multi-step prediction, a distributed regional PV power prediction model based on a stacked ensemble algorithm was proposed, leading to the following conclusions:

By selecting representative power stations, the scale of an entire large region can be reduced, thereby lowering computational complexity. Compared with global prediction over a large region, using representative power stations for prediction can greatly simplify the model’s computational process and data transmission, improving efficiency. By employing the graph attention network (GAT) method to construct a graph network within sub-regions and learning attention weights between nodes, important node information can be adaptively selected. In this case, the GAT helped us choose influential power stations within sub-regions which could more accurately represent the overall power generation of the entire region.
A distributed regional PV power prediction model based on a stacked ensemble algorithm was proposed, which combined different basic models to form a multi-level model structure. By integrating and combining the model’s predictions at different levels, information across sub-regions can be effectively utilized. This integration capability helps the model better capture the shared features and similarity relationships between sub-regions, thereby improving the performance of the prediction model.
Through instance analysis and comparative experiments on different seasons and time scales, the proposed model demonstrated high accuracy, a strong generalization ability, and robustness, providing new insights for regional PV power prediction.

Although the method proposed in this study demonstrated significant advantages in regional forecasting, there are still some shortcomings which need further improvement and outlooks:

The data used in this study were sourced from a single origin and thus lacked diversity, which could have impacted the model’s generalization ability and prediction effectiveness. In future research, it is necessary to incorporate data from different regions and countries to enhance the diversity of the dataset.
The process from regional division and the selection of representative power stations to the final forecast of large regions consumed a substantial amount of time. In future research, it is essential to streamline the model structure to reduce time costs while ensuring high prediction accuracy.

Author Contributions

Conceptualization, K.H.; methodology, C.L. and S.S.; software, Z.F.; validation, Z.F.; formal analysis, C.L.; investigation, Z.F. and S.S.; resources, Y.F.; data curation, C.L.; writing—original draft, C.L.; writing—review & editing, K.H. and B.W.; supervision, Y.F. and B.W.; project administration, K.H.; funding acquisition K.H. and B.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the joint funds of the Zhejiang Provincial Natural Science Foundation of China (Grant No. LHY21E090004 and LHZSZ24F020001), the Education Science Planning Project of Zhejiang Province in China (Grant No. 2024SCG026), Project funding from the Zhejiang Higher Education Association in China (Grant No. KT2024170), Teaching Construction and Reform Project of Hangzhou Normal University in China (Research on Smart Teaching Reform Based on Digital Twins), and the Scientific Research Foundation of Qianjiang College of Hangzhou Normal University (Grant No. 2022QJJL02), Zhejiang Provincial Natural Science Foundation of China (Grant No. LQ23F010005), Big Ideological and Political Course of Hangzhou Normal University in China (Information Management), Teaching Construction and Reform Project of Hangzhou Normal University in China (Teaching Reform of “Information Management” Course Based on 101 Plan).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. The structure and related parameters of the CNN-LSTM-MHA model.

Parameters	CNN-LSTM-MHA
CNN-1 (convolution kernel size/number of convolution kernels)	1/64
CNN-2 (convolution kernel size/number of convolution kernels)	3/34
CNN-3 (convolution kernel size/number of convolution kernels)	1/64
LSTM-1 (cells)	100
LSTM-2 (cells)	64
LSTM-3 (cells)	400
LSTM-4 (cells)	64
LSTM-5 (cells)	80
Multi-head attention (heads)	5
Dropout parameter	0.1
learning_rate	0.0001
Optimizer	Adam
batch_size	50

Table A2. The detailed evaluation indicators for Region B in 1 h.

Sub-Region	Season	Method	MAE	RMSE
B 1 h	Spring	Model 1	2.3186	3.3771
		Model 2	1.9508	2.8504
		Model 3	2.0523	4.0718
	Summer	Model 1	1.6538	2.3936
		Model 2	0.9737	1.3365
		Model 3	1.4679	2.0254
	Autumn	Model 1	1.7995	2.6118
		Model 2	3.5888	4.3422
		Model 3	2.0176	2.8212
	Winter	Model 1	4.1139	5.3951
		Model 2	2.1666	2.8827
		Model 3	4.9189	6.7063

Table A3. The detailed evaluation indicators for Region B in 3 h.

Sub-Region	Season	Method	MAE	RMSE
B 3 h	Spring	Model 1	2.0986	3.1738
		Model 2	2.2071	3.2437
		Model 3	2.3256	3.3833
	Summer	Model 1	2.0986	3.1738
		Model 2	1.3694	2.1247
		Model 3	2.3644	2.8664
	Autumn	Model 1	1.8926	2.7912
		Model 2	1.5159	2.3909
		Model 3	1.7255	2.4533
	Winter	Model 1	2.519	3.6041
		Model 2	1.5975	2.3817
		Model 3	1.8356	2.5362

Table A4. The detailed evaluation indicators for Region B in 5 h.

Sub-Region	Season	Method	MAE	RMSE
B 5 h	Spring	Model 1	2.3818	3.342
		Model 2	1.5725	2.6347
		Model 3	1.4767	2.3275
	Summer	Model 1	2.6922	3.6592
		Model 2	1.9271	2.598
		Model 3	2.0109	2.7458
	Autumn	Model 1	1.9572	2.6807
		Model 2	3.1153	3.7569
		Model 3	2.7378	3.2642
	Winter	Model 1	2.1137	2.9147
		Model 2	1.5733	2.4565
		Model 3	2.0515	3.0435

Table A5. The detailed evaluation indicators for Region C in 1 h.

Sub-Region	Season	Method	MAE	RMSE
C 1 h	Spring	Model 1	0.7361	1.332
		Model 2	0.5607	0.9965
		Model 3	0.5621	1.0134
	Summer	Model 1	0.258	0.4986
		Model 2	0.3566	0.5514
		Model 3	0.2814	0.4724
	Autumn	Model 1	0.7184	0.8341
		Model 2	0.4547	0.6798
		Model 3	0.4841	0.7354
	Winter	Model 1	0.7485	1.0218
		Model 2	0.7334	0.9438
		Model 3	0.8616	1.0237

Table A6. The detailed evaluation indicators for Region C in 3 h.

Sub-Region	Season	Method	MAE	RMSE
C 3 h	Spring	Model 1	0.4476	0.9364
		Model 2	0.4406	0.9356
		Model 3	0.5282	0.9572
	Summer	Model 1	0.5413	0.6998
		Model 2	0.4777	0.6814
		Model 3	0.5076	0.7374
	Autumn	Model 1	0.4126	0.6139
		Model 2	0.4607	0.7195
		Model 3	0.426	0.6773
	Winter	Model 1	0.6886	0.9259
		Model 2	0.58	0.8281
		Model 3	0.6364	0.8662

Table A7. The detailed evaluation indicators for Region C in 5 h.

Sub-Region	Season	Method	MAE	RMSE
C 5 h	Spring	Model 1	0.673	1.0403
		Model 2	0.4967	0.9782
		Model 3	0.9759	1.2924
	Summer	Model 1	0.7226	0.9733
		Model 2	0.5588	0.753
		Model 3	0.5875	0.8317
	Autumn	Model 1	0.5571	0.765
		Model 2	0.5196	0.7218
		Model 3	0.5223	0.7815
	Winter	Model 1	0.6408	0.8487
		Model 2	0.5562	0.8148
		Model 3	0.5625	0.8768

The step-by-step details for parsing LSTM are as follows:

1.: Determine the information flow into the cell state, where $f_{t} = σ (W_{f} x_{t} + U_{f} h_{t - 1} + b_{f})$ .

This decision is controlled by the “forget gate” through the sigmoid function. Based on the previous time step’s output

h_{t - 1}

and the current input

x_{t}

, a value between 0 and 1 is generated to decide whether to retain, forget, or partially retain the previous

c_{t - 1}

information. For example, in previous sentences, much content was learned, some of which might be irrelevant to the current task, and thus it can be selectively filtered out.

2.: Generate new information to be updated.

This process involves two parts:

i_{t} = σ (W_{i} x_{t} + U_{i} h_{t - 1} + b_{i})

and

{\tilde{c}}_{t} = \tanh (W_{c} x_{t} + U_{c} h_{t - 1})

.

The first part is the “input gate” layer, which uses the sigmoid function to determine which information needs updating.

The second part is a tanh layer used to generate new candidate values

{\tilde{c}}_{t}

, which may be added to the cell state as values generated by the current layer. These two parts’ generated values are combined to update the cell state.

3.: Update the information in the old cell state, where $c_{t} = f_{t} ⊙ c_{t - 1} + i_{t} ⊙ {\tilde{c}}_{t}$ .

Initially, we forget unnecessary information and then add QAZ to obtain the candidate value. This process aims to eliminate unnecessary information and add new content. For example, if the cell state previously stored information about Zhang San, and now information about Li Si is received, then we need to discard Zhang San’s information and retain Li Si’s information.

4.: Model the output such that $o_{t} = σ (W_{o} x_{t} + U_{o} h_{t - 1} + b_{o})$ , where $h_{t} = o_{t} ⊙ \tanh (c_{t})$ .

Initially, the initial output

o_{t}

is obtained through the sigmoid layer, and then

c_{t}

is scaled from −1 to 1 using the tanh function and finally multiplied element-wise with

o_{t}

to obtain the model’s output.

References

Ahmed, R.; Sreeram, V.; Togneri, R.; Datta, A.; Arif, M.D. Computationally expedient Photovoltaic power Forecasting: A LSTM ensemble method augmented with adaptive weighting and data segmentation technique. Energy Convers. Manag. 2022, 258, 115563. [Google Scholar] [CrossRef]
Li, G.; Guo, S.; Li, X.; Cheng, C. Short-term Forecasting Approach Based on bidirectional long short-term memory and convolutional neural network for Regional Photovoltaic Power Plants. Sustain. Energy Grids Netw. 2023, 34, 101019. [Google Scholar] [CrossRef]
Zhu, Q.; Li, J.; Qiao, J.; Shi, M.; Wang, C. Application and Prospect of artificial intelligence technology in renewable energy forecasting. Proc. CSEE 2023, 43, 3027–3048. [Google Scholar]
Yona, A.; Senjyu, T.; Funabashi, T.; Kim, C.H. Determination method of insolation prediction with fuzzy and applying neural network for long-term ahead PV power output correction. IEEE Trans. Sustain. Energy 2013, 4, 527–533. [Google Scholar] [CrossRef]
Sanjari, M.J.; Gooi, H.B. Probabilistic forecast of PV power generation based on higher order Markov chain. IEEE Trans. Power Syst. 2016, 32, 2942–2952. [Google Scholar] [CrossRef]
Xie, T.; Zhang, G.; Liu, H.; Liu, F.; Du, P. A hybrid forecasting method for solar output power based on variational mode decomposition, deep belief networks and auto-regressive moving average. Appl. Sci. 2018, 8, 1901. [Google Scholar] [CrossRef]
Junior, J.G.d.S.F.; Oozeki, T.; Ohtake, H.; Shimose, K.-I.; Takashima, T.; Ogimoto, K. Regional forecasts and smoothing effect of photovoltaic power generation in Japan: An approach with principal component analysis. Renew. Energy 2014, 68, 403–413. [Google Scholar] [CrossRef]
Ahn, H.K.; Park, N. Deep RNN-based photovoltaic power short-term forecast using power IoT sensors. Energies 2021, 14, 436. [Google Scholar] [CrossRef]
Agga, A.; Abbou, A.; Labbadi, M.; El Houm, Y.; Ali, I.H.O. CNN-LSTM: An efficient hybrid deep learning architecture for predicting short-term photovoltaic power production. Electr. Power Syst. Res. 2022, 208, 107908. [Google Scholar] [CrossRef]
Qi, X.; Zheng, X.; Chen, Q. A short term load forecasting of integrated energy system based on CNN-LSTM. E3S Web Conf. 2020, 185, 01032. [Google Scholar] [CrossRef]
Niu, D.; Yu, M.; Sun, L.; Gao, T.; Wang, K. Short-term multi-energy load forecasting for integrated energy systems based on CNN-BiGRU optimized by attention mechanism. Appl. Energy 2022, 313, 118801. [Google Scholar] [CrossRef]
Gao, D.; Liu, X.; Zhu, Z.; Yang, Q. A hybrid cnn-bilstm approach for remaining useful life prediction of evs lithium-ion battery. Meas. Control 2023, 56, 371–383. [Google Scholar] [CrossRef]
Zhang, Y.; Li, P.; Li, H.; Zu, W.; Zhang, H. Short-Term Power Prediction of Wind Power Generation System Based on Logistic Chaos Atom Search Optimization BP Neural Network. Int. Trans. Electr. Energy Syst. 2023, 2023, 6328119. [Google Scholar] [CrossRef]
Qiu, F.; Wang, N.; Wang, Y. Short term photovoltaic power generation prediction model based on improved GA-LSTM neural network. In Proceedings of the 4th International Conference on Information Science, Electrical, and Automation Engineering (ISEAE 2022), Hangzhou, China, 25–27 March 2022; Volume 12257, pp. 13–18. [Google Scholar]
Wang, T.; Beard, R.; Hawkins, J.; Chandra, R. Recursive deep learning framework for forecasting the decadal world economic outlook. arXiv 2023, arXiv:2301.10874. [Google Scholar]
Zhang, Y.; Pan, Z.; Wang, H.; Wang, J.; Zhao, Z.; Wang, F. Achieving wind power and photovoltaic power prediction: An intelligent prediction system based on a deep learning approach. Energy 2023, 283, 129005. [Google Scholar] [CrossRef]
Liu, C.; Li, M.; Yu, Y.; Wu, Z.; Gong, H.; Cheng, F. A review of multitemporal and multispatial scales photovoltaic forecasting methods. IEEE Access 2022, 10, 35073–35093. [Google Scholar] [CrossRef]
Wang, X.; Yu, M.; Huo, Z.; Yang, J. Short-term power forecasting of distributed photovoltaic station clusters based on affinity propagation clustering and long short-term time-series network. Autom. Electr. Power Syst. 2023, 47, 133–141. [Google Scholar]
Gong, D.; Chen, N.; Ji, Q.; Tang, Y.; Zhou, Y. Multi-scale regional photovoltaic power generation forecasting method based on sequence coding reconstruction. Energy Rep. 2023, 9, 135–143. [Google Scholar] [CrossRef]
Wang, Y.; Shen, J.; Chen, C.; Zhou, B.; Zhang, C. Description of wind and solar power generation considering power plant clusters and time-varying power characteristics. Power Syst. Technol. 2023, 47, 1558–1572. [Google Scholar]
Simeunović, J.; Schubnel, B.; Alet, P.J.; Carrillo, R.E. Spatio-temporal graph neural networks for multi-site PV power forecasting. IEEE Trans. Sustain. Energy 2021, 13, 1210–1220. [Google Scholar] [CrossRef]
Zhang, X.; Gao, R.; Zhu, C.; Liu, C.; Mei, S. Ultra-short-term prediction of regional photovoltaic power based on dynamic graph convolutional neural network. Electr. Power Syst. Res. 2024, 226, 109965. [Google Scholar] [CrossRef]
Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
Cao, Y.; Liu, G.; Luo, D.; Bavirisetti, D.P.; Xiao, G. Multi-timescale photovoltaic power forecasting using an improved Stacking ensemble algorithm based LSTM-Informer model. Energy 2023, 283, 128669. [Google Scholar] [CrossRef]
Dka Solar Centre. Available online: https://dkasolarcentre.com.au/locations/yulara (accessed on 22 July 2024).
Liu, Q.; Li, Y.; Jiang, H.; Chen, Y.; Zhang, J. Short-term photovoltaic power forecasting based on multiple mode decomposition and parallel bidirectional long short term combined with convolutional neural networks. Energy 2024, 286, 129580. [Google Scholar] [CrossRef]

Figure 1. General flow chart.

Figure 2. Sub-region division diagram.

Figure 3. GAT introduces multiple attention mechanisms.

Figure 4. Selection diagram of power stations.

Figure 5. Basic model structure diagram.

Figure 6. MIC correlation heat maps (a) and Pearson correlation heat maps (b).

Figure 7. Comparison of model evaluation results at different time scales in spring and summer for region A (a). Comparison of model evaluation results at different time scales in autumn and winter for region A (b).

Figure 8. Comparison of prediction results at 1 h (a), prediction results at 3 h (b), and prediction results at 5 h (c).

Table 1. Data information summary table.

Power Station	MIN	MAX	AVG	SD
Wind_Speed	0.043	51.68	1.97	1.17
Temperature	−2.4	43.21	22.13	8.57
Radiation	−12.87	1391.46	250.00	345.8
Wind_Direction	11.60	346.90	168.65	57.73
Max_Wind_Speed	0.20	22.60	3.77	2.21
Air_Pressure	942.19	971.78	957.10	5.55
Pyranometer_1	29.99	82.15	43.77	10.82
Pyranometer_2	30.02	78.71	43.88	10.91

Table 2. Spring’s representation of the selection of power stations.

Name	RMSE	MAE
DG	71.9280	40.7104
SS	7.6795	3.7006
SD-1A	0.7686	0.3244
SD-2A	1.3084	0.5419
SD-3A	1.5375	0.6934

Table 3. Summer’s representation of the selection of power stations.

Name	RMSE	MAE
DG	44.4458	24.8324
SS	9.0206	4.6554
SD-1A	0.9048	0.4669
SD-2A	1.5367	0.8302
SD-3A	2.5622	1.4479

Table 4. Autumn’s representation of the selection of power stations.

Name	RMSE	MAE
DG	78.5325	44.1866
SS	8.3158	4.3670
SD-1A	1.1040	0.6614
SD-2A	1.6422	0.9526
SD-3A	2.0778	1.2771

Table 5. Winter’s representation of power station selection.

Name	RMSE	MAE
DG	59.3172	33.7924
SS	17.1039	7.8272
SD-1A	1.9349	0.9600
SD-2A	3.2427	1.5860
SD-3A	8.3029	4.1430

Table 6. The detailed evaluation indicators for region A in 1 h.

Sub-Region	Season	Method	MAE	RMSE
A 1 h	Spring	Model 1	6.3492	9.4532
		Model 2	5.055	7.8378
		Model 3	5.4976	7.9688
	Summer	Model 1	3.296	5
		Model 2	3.1492	3.7488
		Model 3	3.6938	5.0222
	Autumn	Model 1	9.6351	14.2316
		Model 2	3.3574	5.1431
		Model 3	4.8213	6.5564
	Winter	Model 1	9.1667	12.1053
		Model 2	6.6326	8.7282
		Model 3	6.3783	9.568

Table 7. The detailed evaluation indicators for region A in 3 h.

Sub-Region	Season	Method	MAE	RMSE
A 3 h	Spring	Model 1	5.4884	9.7354
		Model 2	3.5813	6.4893
		Model 3	5.5008	9.2484
	Summer	Model 1	3.5041	5.2421
		Model 2	3.0622	4.7234
		Model 3	4.3811	5.7006
	Autumn	Model 1	4.4913	6.8614
		Model 2	5.9799	7.7987
		Model 3	4.2718	6.5595
	Winter	Model 1	7.6908	10.9833
		Model 2	6.6309	9.448
		Model 3	7.2278	10.4374

Table 8. The detailed evaluation indicators for Region A in 5 h.

Sub-Region	Season	Method	MAE	RMSE
A-5 h	Spring	Model 1	7.5259	8.7838
		Model 2	5.4139	7.3349
		Model 3	6.3482	8.2565
	Summer	Model 1	2.9274	4.2
		Model 2	3.9323	5.4412
		Model 3	3.8541	5.6159
	Autumn	Model 1	5.606	7.9728
		Model 2	5.1419	7.4288
		Model 3	6.0502	8.8428
	Winter	Model 1	7.3321	10.1048
		Model 2	5.11	8.0224
		Model 3	13.4422	17.3235

Table 9. The detailed evaluation indicators for large area in 1 h.

Sub-Region	Season	Method	MAE	RMSE	Time/Epoch
Large area 1 h	Spring	Proposed method	3.9095	6.4404	47.45
		BiLSTM-CNN (SC)	12.522	23.072	435.65
		CNN-LSTM-MHA (SC)	13.471	26.018	85.35
		Extrapolation method	4.5943	8.4129	/
	Summer	Proposed method	2.3484	3.571	48.75
		BiLSTM-CNN (SC)	8.1278	13.065	176.5
		CNN-LSTM-MHA (SC)	10.667	16.421	76.55
		Extrapolation method	3.1873	4.401	/
	Autumn	Proposed method	3.9964	6.6362	49.35
		BiLSTM-CNN (SC)	16.261	33.109	179.5
		CNN-LSTM-MHA (SC)	19.396	40.26	156.75
		Extrapolation method	5.143	7.2907	/
	Winter	Proposed method	5.6831	8.4373	49.45
		BiLSTM-CNN (SC)	21.803	41.533	175.25
		CNN-LSTM-MHA (SC)	28.859	54.12	175.55
		Extrapolation method	6.7251	10.859	/

Table 10. The detailed evaluation indicators for large area in 3 h.

Sub-Region	Season	Method	MAE	RMSE	Time/Epoch
Large area 3 h	Spring	Proposed method	3.4164	5.7814	53.85
		BiLSTM-CNN (SC)	10.019	21.481	197.65
		CNN-LSTM-MHA (SC)	13.522	25.076	283.55
		Extrapolation method	4.8875	12.911	/
	Summer	Proposed method	2.638	4.0635	48.5
		BiLSTM-CNN (SC)	7.9229	12.099	645.5
		CNN-LSTM-MHA (SC)	9.3224	14.201	153
		Extrapolation method	3.2867	4.9514	/
	Autumn	Proposed method	5.2617	7.3888	48.95
		BiLSTM-CNN (SC)	12.565	21.721	565.55
		CNN-LSTM-MHA (SC)	18.241	37.43	83.75
		Extrapolation method	8.2376	10.322	/
	Winter	Proposed method	5.0982	8.4689	48.5
		BiLSTM-CNN (SC)	9.6462	15.383	436.5
		CNN-LSTM-MHA (SC)	29.593	50.796	79.55
		Extrapolation method	7.7887	11.962	/

Table 11. The detailed evaluation indicators for large area in 5 h.

Sub-Region	Season	Method	MAE	RMSE	Time/Epoch
Large area 5 h	Spring	Proposed method	4.8784	6.9078	49.85
		BiLSTM-CNN (SC)	6.1244	11.706	135.55
		CNN-LSTM-MHA (SC)	10.050	13.800	77
		Extrapolation method	5.5592	7.9815	/
	Summer	Proposed method	2.7627	4.0677	49.85
		BiLSTM-CNN (SC)	6.614	10.112	417.85
		CNN-LSTM-MHA (SC)	5.074	7.6396	73.85
		Extrapolation method	3.8917	5.5537	/
	Autumn	Proposed method	5.8329	9.3144	101.85
		BiLSTM-CNN (SC)	9.9632	16.754	521.35
		CNN-LSTM-MHA (SC)	6.5907	9.9885	141.45
		Extrapolation method	9.4654	11.150	/
	Winter	Proposed method	8.242	11.714	120.5
		BiLSTM-CNN (SC)	29.785	38.806	648.85
		CNN-LSTM-MHA (SC)	13.605	18.122	128.95
		Extrapolation method	23.069	62.800	/

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hu, K.; Lang, C.; Fu, Z.; Feng, Y.; Sun, S.; Wang, B. Distributed Regional Photovoltaic Power Prediction Based on Stack Integration Algorithm. Mathematics 2024, 12, 2561. https://doi.org/10.3390/math12162561

AMA Style

Hu K, Lang C, Fu Z, Feng Y, Sun S, Wang B. Distributed Regional Photovoltaic Power Prediction Based on Stack Integration Algorithm. Mathematics. 2024; 12(16):2561. https://doi.org/10.3390/math12162561

Chicago/Turabian Style

Hu, Keyong, Chunyuan Lang, Zheyi Fu, Yang Feng, Shuifa Sun, and Ben Wang. 2024. "Distributed Regional Photovoltaic Power Prediction Based on Stack Integration Algorithm" Mathematics 12, no. 16: 2561. https://doi.org/10.3390/math12162561

APA Style

Hu, K., Lang, C., Fu, Z., Feng, Y., Sun, S., & Wang, B. (2024). Distributed Regional Photovoltaic Power Prediction Based on Stack Integration Algorithm. Mathematics, 12(16), 2561. https://doi.org/10.3390/math12162561

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Distributed Regional Photovoltaic Power Prediction Based on Stack Integration Algorithm

Abstract

1. Introduction

2. Regional Photovoltaic Power Generation Forecasting Framework

3. Selecting Sub-Regions for Power Stations

Graph Attention Networks

4. Stack Integration Algorithm

4.1. Basic Model

4.1.1. Convolutional Neural Networks

4.1.2. Long Short-Term Memory

4.1.3. Multi-Head Attention Mechanism

4.2. Meta-Model

Gated Recurrent Unit

4.3. Model Evaluation

5. Experimental Research

5.1. Data Description

5.2. Correlation Analysis

5.3. Experimental Results

5.3.1. The Sub-Regions Represent Comparison Experiments for Power Station Selection

5.3.2. Sub-Region Basic Model Comparison Test

5.3.3. Large-Area Meta-Model Comparison Test

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI