1. Introduction
It is projected that the global population will reach 9.6 billion by 2050, which will further drive the demand for energy worldwide and lead to a significant increase in electricity production around the world. According to statistics from the National Energy Administration of China, the cumulative installed capacity of grid-connected wind and solar power in China surpassed 760 million kilowatts by the end of 2022, continuously breaking the milestones of 800 million kilowatts, 900 million kilowatts, and 1 billion kilowatts and reaching 1.05 billion kilowatts by the end of 2023. This accounted for 36% of the total installed capacity, representing a 6.4 percent increase compared with the previous year. The installed capacity of grid-connected solar power increased from 390 million kilowatts at the end of 2022 to 610 million kilowatts at the end of 2023. In recent years, distributed photovoltaic power generation has entered a phase of rapid development within the industry. Data show that in 2021, the newly installed capacity of distributed photovoltaic power exceeded centralized photovoltaic power for the first time, with an addition of 29.28 million kilowatts, accounting for approximately 55% of the total newly added photovoltaic power generation capacity. In 2022, distributed photovoltaic power development became the main method for wind and solar power development, with a newly installed capacity of 51.11 million kilowatts, accounting for over 58% of the newly added photovoltaic power generation capacity that year. By the end of September 2023, the cumulative installed capacity of distributed photovoltaic power for households in China exceeded 100 million kilowatts, reaching 105 million kilowatts. With the continuous increase in the proportion of distributed photovoltaic installations, the presence of random fluctuations and non-stationarity caused by factors such as complex weather is becoming more prominent, posing a fundamental threat to the security of the power grid [
1]. It also hinders the real-time data collection, perception, and processing capabilities necessary to achieve the goal of observing, measuring, adjusting, and controlling massive resources, as well as enhancing the coordinated interaction capabilities among power sources, energy storage, loads, and the grid. According to statistics from the National Energy Administration of China, solar power generation reached 325.9 billion kilowatt-hours in 2023, with year-on-year growth of 25.1% and a solar power utilization rate of 98.0%. However, there was estimated waste of approximately 6.52 billion kilowatt-hours of solar energy [
2]. Therefore, improving the accuracy of regional photovoltaic power forecasting is of the utmost importance.
Currently, there are primarily three methods for photovoltaic power forecasting: physical methods, statistical methods, and artificial intelligence methods. Physical methods involve complex modeling processes which require the integration of meteorological and engineering expertise [
3], such as astronomical models, meteorological models, and radiation models. While physical methods theoretically provide a deep understanding of photovoltaic power and high accuracy, their complexity, data requirements, and limitations may pose challenges in practical applications. Statistical methods include fuzzy theory [
4], Markov chains [
5], and regression analysis [
6,
7]. Since photovoltaic power is influenced by multiple input variables such as weather conditions, light intensity, and temperature, there are complex interactions and nonlinear relationships among these factors, which statistical methods often struggle to accurately capture and model. With the rapid development of artificial intelligence, deep learning methods have been widely applied in the field of photovoltaic power prediction. Ahn et al. [
8] used an LSTM network based on an RNN for short-term photovoltaic power forecasting, addressing issues such as gradient vanishing and exploding, which exist in traditional RNNs. However, as the network becomes deeper, the performance of LSTM may decline. Agga et al. utilized convolutional neural networks to capture spatial features and local details in the data. However, due to the limited perception capability of the convolutional kernel for global information, even with larger convolutional kernels, their capture range was limited, resulting in suboptimal peak prediction. To address the performance decline of LSTM with increasing network depth and the weak global information perception capability of convolutional networks, the CNN-LSTM model was proposed [
9]. The emergence of this hybrid model has enabled people to address the existing issues of single models, leading to the emergence of numerous hybrid methods. For instance, Qi et al. [
10] employed a CNN-LSTM model to forecast short-term loads in integrated energy systems, demonstrating that this model outperforms the CNN and LSTM models in terms of prediction accuracy. Niu et al. [
11] utilized attention mechanisms to optimize a CNN-BiGRU model for short-term multi-energy load forecasting, resulting in an average MAPE improvement of 66.09% compared with the single LSTM model. Gao et al. [
12] used a CNN-BiLSTM model to predict the remaining lifespans of lithium-ion batteries in electric vehicles, showcasing through comparisons with other classical models that hybrid models possess higher generalization capabilities and prediction accuracy. The above methods have been proven to improve the accuracy of individual power predictions in power plants. However, the optimization problem of deep learning models is often non-convex, meaning that there are multiple local optima and saddle points. As the number of hybrid models increases, the number of local optima and saddle points also increases, making it easier for the model to become trapped in local optima and saddle points and thus fail to achieve the best prediction accuracy. To address these issues, researchers have proposed using optimization algorithms to find the optimal solutions in models, such as LCASO-BP [
13], GA-LSTM [
14], ED-LSTM [
15], and PSO-VMDFE-WHO-CNN [
16]. These methods have demonstrated the feasibility of using swarm intelligence optimization algorithms to find the optimal hyperparameters in models, thereby improving the accuracy of individual power plant predictions. However, most current photovoltaic power forecasting focuses on individual power plants, and there is limited research on forecasting regional photovoltaic power plants.
There are three main methods for regional photovoltaic power forecasting: cumulative methods, extrapolation methods, and statistical scale methods [
17]. (1) In cumulative methods, the power of all individual photovoltaic power stations in a whole large area is predicted and then directly added to find the regional prediction results. This method is quite simple to implement, but the problem is also rather obvious. The number of photovoltaic power stations in a large area is extremely large, and if there is a large error one in of the power stations in the forecast process, then in the accumulation process, the error will gradually increase, and each power station will be distributed in different areas, which also leads to different characteristics and the need to build characteristics for each power station, resulting in much work. (2) In extrapolation methods, the region’s distributed photovoltaic power plants are divided into several sub-regions. Representative power values are selected to predict the output of each sub-region, and then the output values of each sub-region are summed to obtain the output value of the entire region. (3) Statistical scale methods involve dividing a large region into multiple sub-regions. Within each sub-region, representative power plants are selected, and their power outputs are individually predicted. Finally, using mathematical statistics, the weight coefficients of each representative power plant are calculated based on the proportion of the baseline power plant’s generation in each sub-region’s photovoltaic power plant group. The regional photovoltaic power plant forecast is then calculated through weighted aggregation.
The aforementioned methods for sub-region division include nearest neighbor propagation clustering [
18], grid-partitioned sampling [
19], k-means clustering [
2], and hierarchical clustering [
20]. After sub-region division, representative photovoltaic power plants are selected from each sub-region. Li et al. [
2] calculated the correlation coefficients (such as the correlation coefficients, Pearson correlation coefficient, and Spearman’s rank correlation coefficient) between the power generation of each photovoltaic power plant and the total power generation of the sub-region. The photovoltaic power plant with the highest correlation coefficient was defined as the baseline power plant. Although the aforementioned methods are based on the statistical scale method, they often only consider the correlation between the representative power plant and the total regional power, neglecting the spatial correlation between the representative power plant and other power plants within the sub-region. This limitation restricts the utilization of data from other power plants and limits the improvement in prediction accuracy.
For example, within a sub-region, each power plant may have different capacities and installation scales, or the geographic locations and environmental conditions of the power plants may result in uneven resource distribution. In such cases, power plants with larger capacities or those situated in resource-rich areas would have a significant advantage in correlation calculations. However, representative power plants selected using this approach may not effectively represent the entire sub-region. This is because this method overlooks the spatial correlation between the representative power plant and other power plants within the sub-region. Each power plant may have different locations and relationships with other plants in space, and these relationships play an important role in the power characteristics of the entire sub-region. Methods which solely consider the overall regional power correlation may fail to capture this spatial correlation, leading to representative power plants which do not fully reflect the characteristics of the entire sub-region.
To address this issue, Simeunović et al. [
21] proposed the application of graph convolutional networks (GCNs) for multi-site photovoltaic power prediction and demonstrated its effectiveness. Zhang et al. [
22] constructed a GCN-LSTM prediction model to improve the accuracy of ultra-short-term photovoltaic power forecasting. However, the aforementioned methods have certain limitations in information aggregation when using GCNs. A GCN employs a fixed aggregation approach by linearly combining the features of the input nodes and their neighboring nodes. Furthermore, a GCN only focuses on the immediate neighbors of a node and ignores distant nodes. This restriction can limit the node representation to the information within its local neighborhood, failing to fully utilize the global information of the entire graph. To overcome this limitation, Velickovic et al. [
23] introduced graph attention networks (GATs) and incorporated an attention mechanism to model the relationships between nodes. In contrast to GCNs, GATs dynamically aggregate information from neighboring nodes by learning the weights between each node and its neighbors. This adaptive aggregation allows each node to determine the extent of interaction with its neighbors, capturing the complex relationships between nodes. Additionally, GATs support multi-head attention mechanisms, where multiple attention heads are used in each layer for feature aggregation. Each attention head can learn different node relationships, providing richer graph structure information. By utilizing multiple attention heads, a GAT can simultaneously consider multiple node relationships, enhancing the model’s expressive power and generalization ability.
In summary, the existing methods for regional photovoltaic (PV) power prediction have the following limitations:
Compared with the summation method and extrapolation method, statistical scaling methods are more suitable for regional PV power prediction. However, due to various factors such as different environmental conditions affecting different PV power plants, selecting a reference PV power plant may not fully represent the characteristics and variations of all PV power plants within the sub-region.
The operation and maintenance of PV power plants can lead to changes in their power generation capabilities. The proportion of power generated by the reference PV power plant may vary over time, which can affect the estimation of the reference PV power plant’s proportion.
Extrapolating the entire region’s PV power based on proportions alone does not take into account the spatiotemporal characteristics and synergistic effects among sub-regions, resulting in less accurate predictions.
To address these existing issues, we propose a distributed regional photovoltaic (PV) power prediction method based on the stacked ensemble algorithm [
24]. The main contributions are as follows:
In the process of selecting representative power plants for each sub-region, we utilize a graph attention network (GAT). The GAT models all the power plants within each sub-region by leveraging its characteristics, representing the connections and interactions among the power plants using a graph structure. By learning the weights and attention distributions between power plants, we integrate and fuse the features and relationships among different sub-regions, ultimately selecting the most representative power plants.
We employ a CNN-LSTM-multi-head attention parallel multi-channel model as the base model for the stacked ensemble algorithm. This approach fully utilizes the strengths of the CNN and LSTM models, enhancing the model’s feature representation capabilities and sequence data modeling abilities. Additionally, we incorporate a multi-head attention mechanism for comprehensive feature weighting and fusion, considering the different feature representations and sequence modeling capabilities. This results in more comprehensive and accurate model outputs.
By using the outputs of the base model as inputs to the meta-model, we incorporate the features and spatiotemporal characteristics of each sub-region into the consideration of the meta-model. The base model can perform individual feature extraction and modeling for each sub-region, capturing the spatiotemporal characteristics within the sub-region. The meta-model then integrates and consolidates the outputs of different sub-regions, better reflecting the characteristics of the entire region and predicting PV power generation.
The organizational structure of this paper is as follows. The first chapter describes the existing methods and problems in the field of regional photovoltaic power generation forecasting and puts forward solutions to the above problems.
Section 2 introduces the framework of regional forecasting in this paper.
Section 3 explains the selection of representative power stations.
Section 4 introduces the stack integration algorithm and the basic model and meta-model selected in the algorithm. In
Section 5, the feasibility of the proposed method is proven by comparing the model with other excellent models.
Section 6 gives a final summary of this paper.
3. Selecting Sub-Regions for Power Stations
When selecting sub-regions to represent power stations, first of all, power stations which can be classified into similar weather conditions by region are classified into three sub-regions (A, B and C), as shown in
Figure 2.
Because power stations in the same area are often affected by similar climatic and environmental conditions. This partitioning helps reduce the complexity of the data and the impact of noise. After the zone is divided, representative power stations are selected from many power stations in the subzone. In the sub-region, there is a temporal and spatial correlation between power stations; that is, the status of a power station may be affected by the surrounding power stations. Most importantly, in the selection of representative power stations, it is necessary to consider not only the relationship between each power station and its neighboring power stations but also the relationship between all power stations in the whole region.
Graph Attention Networks
This article utilizes a graph attention network (GAT) to select representative power plants within each sub-region. The GAT is a graph neural network model designed to capture both the spatial and temporal information among related power plants within the sub-region by introducing attention mechanisms between nodes. Unlike a graph convolutional network (GCN), which performs a simple average aggregation of neighboring nodes, the GAT calculates the attention weights for each node with its neighbors and aggregates them accordingly. This allows the GAT to adaptively model the contributions of different nodes to their neighbors, enabling more flexible capturing of the local structure within the graph. Since each power plant may have varying importance and contributions, this article incorporates a multi-head attention mechanism in the GAT. This mechanism learns different attention weights from multiple perspectives in parallel, based on the relationships and features of the nodes.
Figure 3 illustrates this process.
From the diagram, it is evident that each power plant in the sub-region is depicted as a node (P). The various colored arrows represent distinct attention sets. By continuously learning and incorporating the spatiotemporal features of all other power plants within the region, the attention sets are merged and averaged to yield the final output for each power plant. This method enables the model to consider the impact of each power plant from a global perspective, better capturing the interactions among power plants within the region and improving prediction accuracy.
Through iterative learning and prediction, the power plant with the most accurate predictions is chosen as the representative for the sub-region. This ensures that the selected representative power plant effectively represents the photovoltaic power generation within the entire sub-region, providing a reliable foundation for subsequent analysis and decision making. In cases where the sub-region contains only one power plant, it automatically becomes the representative for the sub-region, as shown in
Figure 4.
In
Figure 4, through the GAT, the representative power station in this sub-region is finally selected: P7. The calculation step takes the features of a group of nodes as the input. For example,
,
, where
F is the dimension of the features owned by each node, and
i = 1, 2, …,
s, where
s is the number of nodes. The correlation attention between nodes
and
is described as follows:
In Equation (1), is the weight matrix, and is the single-layer feedforward neural network. The weight matrix is spliced with the result of the multiplication of node ’s features and node ’s features. After the spliced result is multiplied by , the high-dimensional features are mapped to a real number, namely the attention coefficient . In Equations (2) and (3), is the number of other nodes in node domain , and to perform a normalization operation on , the LeakyReLU activation function is adopted. Finally, combined with the multi-head attention mechanism, as shown in Equation (4), the new node features are generated at last, where , .
4. Stack Integration Algorithm
The accuracy of PV power prediction in large-scale regions is of the utmost importance for energy planning and operational management. To ensure a reliable energy supply and effective operational decision making, higher-precision models are required for predicting PV power in large-scale regions. Building upon the statistical scale method, this paper divides a large-scale region into multiple sub-regions. To better capture the shared features and similarity relationships among the sub-regions, a combination model based on stacked ensemble algorithms is employed for predicting PV power in the large-scale region.
The stacked ensemble algorithm improves overall prediction performance by combining multiple base models. Its main concept involves using the predictions of the base models as input features and utilizing another meta-model for the final prediction. This algorithm effectively integrates the strengths of multiple models. By incorporating predictions from multiple base models, it can better adapt to new and unseen data. This generalization ability allows the model to demonstrate higher prediction capabilities and increased reliability when encountering various PV power generation scenarios in different sub-regions. Furthermore, the algorithm can be easily expanded to incorporate additional base models and adjusted and improved as needed, providing enhanced scalability and flexibility.
4.1. Basic Model
In the process of photovoltaic power generation, many influential characteristics are often involved, such as weather data (sunshine time, temperature, humidity, etc.), geographical location information, and photovoltaic module performance parameters. These data can have highly nonlinear relationships, and there can be temporal and spatial correlations and interaction effects, which also make it difficult for simple machine learning models to capture complex patterns and associations in the data. In this paper, the CNN-LSTM multi-channel parallel (CNN-LSTM (PC)) model is selected in the basic model training stage, which can maximize the advantages of combining the CNN and LSTM approaches, as shown in
Figure 5.
Firstly, the parallel training of the CNN and LSTM models allows for comprehensive utilization of both the temporal and spatial features in the data. The CNN is capable of extracting spatial features from the data, while the LSTM model, with its memory cells and gating mechanisms, selectively remembers and forgets past information. This enables the model to better capture long-term patterns and trends in the power data of sub-regions. Additionally, there may exist spatial correlations in the power data of the region, meaning that adjacent sub-regions may have similar or related power values. Through its recursive structure, LSTM can simultaneously handle both time series and spatial correlations, thereby effectively leveraging the interdependence between sub-regions for prediction. By training these two models in parallel, we can consider the spatiotemporal information within the power plant data simultaneously, leading to more comprehensive modeling and prediction of power generation.
Secondly, through the multi-head attention mechanism, we can merge the outputs of the CNN and LSTM models. The multi-head attention mechanism allows for the weighting and fusion of different feature subspaces, enabling the model to focus more on important features and spatiotemporal relationships. This mechanism enhances the model’s perception of critical features in the power plant data, thus improving the accuracy and robustness of predictions.
Lastly, the final prediction is obtained by mapping the output of the multi-head attention mechanism through fully connected layers. The fully connected layers enable the combination and transformation of features, converting the high-level features extracted by the multi-head attention mechanism into the final prediction output. This structure possesses strong expressive power, being capable of adapting to complex patterns in power plant data and producing accurate prediction results. The structure and related parameters of the model are shown in
Table A1 in
Appendix A.
4.1.1. Convolutional Neural Networks
Convolutional neural networks (CNNs) have achieved significant success in image processing, and their feature extraction capability plays a crucial role in time series data as well. Time series data often contain complex patterns and structures, and traditional manual feature engineering is challenging for extracting such features. Through the hierarchical stacking of convolutional operations, a CNN can efficiently capture the spatial features among the data.
Moreover, a specific moment in time series data is usually correlated with its neighboring moments. Convolutional operations can capture local dependencies when processing time series data. By defining appropriate kernel sizes, the CNN can effectively capture local patterns and their evolution in the time series data. Additionally, by using different kernel sizes or multiple layers of convolution, the CNN can achieve a multi-scale representation of the time series data. Smaller kernels can capture detailed features, while larger kernels can capture more macroscopic trend features. Multiple layers of convolution can gradually extract abstract features from the time series data, enabling modeling of different time scales. The CNN’s implementation formulae are as follows:
where
is the output sequence of the convolution operation,
is the index of the output sequence,
is the size of the convolution kernel,
is the size of the pooling window, and
is the step length of the pooling window.
4.1.2. Long Short-Term Memory
Long short-term memory (LSTM) is a variant of recurrent neural networks (RNNs) which has achieved significant success in handling time series data. Time series data often exhibit long-term dependencies and temporal relationships, and traditional RNNs face challenges in dealing with long-term dependencies due to the vanishing or exploding gradient problem. LSTM effectively addresses these issues by introducing gate mechanisms.
The main idea of LSTM is to store and update information through a memory unit called a “cell”. The cell consists of a forget gate, an input gate, and an output gate, each with learnable weights which determine whether to pass or update information. The forget gate decides whether previously stored memory should be forgotten, the input gate determines how new information is integrated into the memory, and the output gate determines how much of the output memory is passed to the next time step.
The key aspect of LSTM lies in its ability to update and retain long-term memory effectively. Through control of the forget gate and input gate, LSTM can selectively forget or store information, enabling more accurate handling of long-term dependencies. This allows LSTM to capture important patterns and structures in time series data without being limited by vanishing or exploding gradients.
Similar to CNNs, LSTM can also extract more abstract features through the stacking of multiple layers. Each LSTM layer can capture different levels of time scales, ranging from lower-level detailed features to higher-level abstract features. This multi-layer structure facilitates a deeper understanding and modeling of time series data by LSTM. The formulae for each state at time
are as follows:
where
,
,
, and
are the weight matrix of the input gate, forget gate, output gate, and memory unit, respectively,
,
,
, and
are the weight matrix of the hidden layer, and
,
, and
are the bias values. A memory unit
is updated through the forget gate and the input gate. The forget gate determines how much information from a memory unit
at the previous time is retained, and the input gate determines how much new information
is added to the memory unit state.
4.1.3. Multi-Head Attention Mechanism
In the multi-head attention mechanism, each attention head has an independent weight allocation mechanism which determines how much attention each head pays to the input. Each attention head generates a weight coefficient vector which is used to weight the sum of the input and obtain a representation of that head. In this way, multiple attention heads can learn different attention patterns in different feature subspaces, thus providing diversified information expression. The formulae for calculating the multi-head attention mechanism are as follows:
In this formula, represent the query vector, key vector, and value vector, respectively, represents the number of headers, represents the output of the head , and represents the output transformation matrix. , , and represent the query, key, and value matrix of the head , respectively, is the dimension of the key vector, and softmax is the similarity normalization.
4.2. Meta-Model
In the forecasting of photovoltaic power generation in large areas, we will pay more attention to the forecast results of the overall power generation. Because the spatial correlation between power stations in a large region may not be obvious, and short-term changes and fluctuations sometimes have a greater impact on the overall forecast results, the GRU is used as the meta-model in this paper.
Gated Recurrent Unit
The gated recurrent unit (GRU), compared with LSTM, has fewer gate units, allowing it to more effectively capture rapid changes and short-term patterns in large-area power data. Additionally, large areas may exhibit significant differences due to geographical location, weather, and other factors. The GRU model can better adapt to the variability between different regions and quickly adjust to changes in different areas. This enables it to accurately capture the characteristics and changing patterns of different regions when predicting power in a large area. As a meta-model, the GRU can synthesize predictions from sub-regions and further forecast the power for an entire large area. It can weigh and adjust the predictions from sub-regions, thereby improving the accuracy for the entire large area.
In this paper, a GRU is used as the meta-model of the stack integration algorithm, and the GRU implementation formulae are as follows:
where
is the input of the current time step,
is the hidden state of the current time step,
is used to control the weight between the hidden state of the previous moment and the candidate hidden state, and
is used to control the influence of the previous hidden state on the candidate state, while
is the candidate hidden state and
,
, and
are weight parameters. See
Appendix A for details.
4.3. Model Evaluation
In this paper, the root mean square error (
), mean absolute error (
), and R squared value (
) were used as evaluation indices to evaluate the prediction results of the proposed model and the comparison test model in different seasons and time scales. The specific formulae for these variables are as follows:
In the formulae, and are the true value and predicted value at time , respectively, is the average value of the true value, and is the total number of test samples.
5. Experimental Research
5.1. Data Description
The experimental hardware set-up for this study included a 2.5 GHz Intel (R) Core (TM) i7-11700H CPU with 32.00 GB of memory, and implementation was performed using the TensorFlow framework and Python language. The experimental data for this study were provided by the Desert Knowledge Australia Solar Centre (DKASC) and originated from the Yulara Solar System in the Ayers Rock region of Australia. Installed in 2014, the Yulara Solar System is an operating 1.8 MW solar photovoltaic plant which was developed with the support of the Australian Renewable Energy Agency (ARENA). Comprising five sub-systems distributed across the local township of Yulara, it sits beside Central Australia’s renowned landmark Uluru (Ayers Rock) in addition to generating electricity for the local grid.
There were a total of 8 meteorological forecast-related fields: Wind_Speed, Temperature, Global_Horizontal_Radiation, Wind_Direction, Max_Wind_Speed, Air_Pressure, Pyranometer_1, and Pyranometer_2. The details are shown in
Table 1. The dataset can be publicly downloaded from the website corresponding to [
25].
Each dataset contained environmental factors and photovoltaic power output data collected at 5 min intervals, resulting in 288 data points per day. Due to human error, communication failures, or other issues during the data collection process, there may have been outliers and missing data. In this study, the 3 sigma criterion was applied to identify and remove outliers. This criterion defines data points which exceed three times the standard deviation as outliers. For missing data points, the Hermite interpolation method was used to estimate reasonable values based on the values and derivative information of existing data points.
As different seasons can have varying impacts on photovoltaic power generation, considering the influence of factors such as sunlight and temperature, the relationship between photovoltaic power generation and meteorological factors may exhibit different patterns at different time scales. Therefore, the dataset was divided into seasons: spring, summer, autumn, and winter. Furthermore, for each season, the data were divided into different time intervals: 1 h, 3 h, and 5 h. This division allowed for better capturing of seasonal patterns and trends between photovoltaic power generation and meteorological factors. Additionally, by modeling multiple time scales, the dynamic characteristics of the data could be more comprehensively captured.
5.2. Correlation Analysis
In photovoltaic power prediction tasks, there are numerous features which can potentially affect the power output. Selecting appropriate input features is crucial for establishing an accurate prediction model. Due to the presence of a large number of potential influencing factors, feature selection helps to identify features strongly correlated with the power output, reducing the interference of redundant information and noise. Choosing highly correlated features as inputs enables better capture of the factors influencing power and improves the accuracy and interpretability of the prediction model.
This study employed the maximal information coefficient (
) correlation and the Pearson correlation coefficient for feature selection and partitioning.
correlation, as a non-parametric measure of correlation, can identify associations between variables of any type, including nonlinear relationships. This capability allows it to discover complex correlations hidden within photovoltaic power data, assisting in finding features strongly correlated with power output without being limited by assumptions about feature distributions. The formula for calculating
correlation is as follows:
where
is the amount of mutual information of
and
,
and
are the number of grid units, and
is usually set to the total number of samples to the power of 0.6.
The Pearson correlation coefficient is a common linear correlation measurement method which is suitable for measuring the strength of a linear relationship. By using the Pearson correlation coefficient, features which had a high linear correlation with the power output could be identified. The relevant calculation formula is as follows:
where
is the covariance of
and
and
is the standard deviation of
and
.
Correlation analysis was carried out between the historical data of actual photovoltaic power generation, relevant meteorological data, and the characteristics of the photovoltaic panels themselves. The correlation analysis results are shown in
Figure 6.
The results show that Wind_Speed (wind speed), Temperature (temperature), Radiation (radiation), Max_Wind_Speed (maximum wind speed), Pyranometer_1 (probe 1’s temperature), and Pyranometer_2 (probe 2’s temperature) had great influence on photovoltaic power generation. Therefore, these influencing factors were selected to be input into the model as features.
5.3. Experimental Results
5.3.1. The Sub-Regions Represent Comparison Experiments for Power Station Selection
In
Section 3, the selection method for a sub-regional representative power station was introduced. The power station with the best prediction result based on a GAT was the representative power station in the region. Sub-region A, for example, had five power stations: DG, SS, SD-1A, SD-2A, and SD-3A. By constructing a GAT, the power of five power stations was predicted and evaluated using the RMSE and MAE indices.
Table 2,
Table 3,
Table 4 and
Table 5 list the results.
Based on the data in
Table 2,
Table 3,
Table 4 and
Table 5 significant observations can be made regarding the prediction accuracy. The RMSE and MAE values for station SD-1A were noticeably smaller compared with the other power stations in the sub-region. This indicates that station SD-1A exhibited exceptional accuracy in its predictions. Taking into account the aforementioned metrics, we can conclude that station SD-1A achieved the best prediction results by iteratively learning and integrating the spatiotemporal features from all other power stations in the region. Therefore, station SD-1A can be considered representative of the entire sub-region. Furthermore, tests conducted across different seasons demonstrate that station SD-1A, as a representative power station, exhibited superior generalization capabilities, performing well in predictions across diverse seasons. This further confirms the superiority of station SD-1A in capturing seasonal variations and its predictive abilities. Based on these results, we selected SD-1A as the representative power station for region A. The same validation process was carried out for regions B and C, identifying representative power stations for their respective sub-regions.
5.3.2. Sub-Region Basic Model Comparison Test
In this study, a model based on the stacked ensemble algorithm was employed to predict the distributed photovoltaics (PVs) in regional areas. The basic model was used to predict the power generation in the sub-regions, and its output served as the input for the meta-model, which was used to predict power generation in the larger region. The accuracy of the basic model’s predictions directly impacts the quality of the input data for the meta-model. If the basic model’s predictions exhibit significant errors, then these errors will propagate to the meta-model, thereby affecting the prediction results for the larger region’s power generation. Additionally, if the selected basic model can adequately learn and capture the spatiotemporal features of the sub-regions, then the meta-model will benefit from these valuable feature representations. Conversely, if the selected basic model has weak feature learning capabilities, then it may not provide sufficient information for predicting the larger region, resulting in decreased prediction accuracy. Liu et al. [
26] demonstrated the feasibility and effectiveness of parallel network structures in the PV domain by utilizing multimodal decomposition and combining parallel bidirectional long short-term memory (BiLSTM) and a convolutional neural network (CNN). Compared with traditional single-channel networks, parallel network structures exhibit stronger generalization and robustness.
In this paper, model 2 (CNN-LSTM-MHA (PC)) was used as the basic model and compared with model 1 (CNN-GRU-MHA (PC)) and model 3 (CNN-BiLSTM-MHA (PC)) in different seasons and different time steps. Taking region A as an example, the model evaluation results are shown in
Figure 7.
In this experiment, dividing the data according to different seasons helped to better capture the seasonal patterns and trends between photovoltaic power generation and meteorological factors. Different seasons’ meteorological factors, such as sunlight and temperature, have varying impacts on photovoltaic power generation. Additionally, predictions were made for 1 h, 3 h, and 5 h ahead. By forecasting the photovoltaic power generation at different time points, the model’s robustness and practical benefits can be better evaluated. From
Figure 7a, it can be seen that model 2’s MAE metric decreased by an average of 1.77 kW and 1.1 kW compared with model 1 and model 3 in the 1 h, 3 h, and 5 h predictions. Similarly, in
Figure 7b, model 2 showed an average reduction of 2.63 kW and 0.22 kW compared with the other two models. The results indicate that by leveraging the strengths of the CNN model in capturing the local features of the data and LSTM in capturing the long-term dependencies, the CNN-LSTM-MHA (PC) model constructed in this study and trained in parallel can maximize model performance. When LSTM was replaced by a GRU and BiLSTM under the same model structure, although good prediction results were achieved, there were still some issues. For instance, the GRU struggled to capture complex data features due to fewer gating units, and BiLSTM required a larger number of parameters, making it prone to overfitting. The model evaluation metrics for sub-region A are shown in
Table 6,
Table 7 and
Table 8 while detailed data for sub-regions B and C can be found in
Table A2,
Table A3,
Table A4,
Table A5,
Table A6 and
Table A7 of
Appendix A.
5.3.3. Large-Area Meta-Model Comparison Test
Large regions typically encompass a wide range of geographical and climatic conditions, making it challenging to select suitable data features for large-area prediction. Additionally, there may be strong spatial correlations among data from different sub-regions. The stacked ensemble algorithm leverages the outputs of base models as new features for the meta-model, addressing the inadequacies of large-area data features and enhancing spatial correlations among different sub-regions. In this study, a GRU was employed as the meta-model to forecast power generation in a large region. To further validate the feasibility of the proposed model, comparative experiments were conducted with classical models such as CNN-LSTM-MHA (SC), BiLSTM-CNN (SC), and the traditional extrapolation method. The prediction data for the sub-regions were all based on the results predicted by the CNN-LSTM-MHA (PC) model. The extrapolation method involved summing the predictions from individual sub-regions. The comparison results are shown in
Figure 8, where the data were randomly selected from one day of the final results.
From the above comparison of the prediction results, it can be observed that both the proposed method in this study and other classical models and traditional methods are capable of capturing the trends in photovoltaic power generation. This indicates that under normal circumstances, they can provide a certain level of prediction accuracy. However, during extreme weather conditions, there was a significant abnormal fluctuation in photovoltaic power generation. In comparison with other methods, the method proposed in this study can more accurately capture the detailed features of the data and demonstrate better fitting at the turning points of the curves. This suggests that this method can more precisely predict the changes in photovoltaic power generation at different time points and identify the inflection points of the power curve. The specific evaluation metrics are shown in
Table 9,
Table 10 and
Table 11.
Based on the data table, it is evident that regardless of the season or time scale, using a GRU as the meta-model yielded significantly better results compared with other classical models. Across the four seasons of spring, summer, autumn, and winter, compared with BiLSTM-CNN (SC), CNN-LSTM-MHA (SC), and the extrapolation method, our proposed model showed average reductions in the MAE at different time scales, namely 5.49 kW, 8.28 kW, and 0.95 kW; 4.97 kW, 5.99 kW, and 0.87 kW; 7.9 kW, 9.73 kW, and 2.56 kW; and 14.07 kW, 17.68 kW, and 6.19 kW, respectively. Therefore, in scenarios where there are fewer features in a large area, the simplicity of the GRU model’s gating mechanism allows it to swiftly and effectively capture crucial information from the data, leading to superior outcomes.
Although the extrapolation method slightly outperformed the other two models in certain metrics, it still fell short compared with our proposed model. The strength of the extrapolation method lies in its simplicity and ease of implementation, but its performance heavily relies on the accuracy of the base model and reasonable sub-region divisions. Insufficient accuracy in the base model or overly detailed sub-region divisions can lead to error accumulation and decreased prediction accuracy.
Obtaining features from large-area data is challenging, resulting in a lower feature count. In such scenarios, models like BiLSTM-CNN (SC) and CNN-LSTM-MHA (SC), due to their complex structures, are prone to getting stuck in local optima during training, leading to significantly increased training times.
This indirectly underscores the importance of choosing the right models when dealing with diverse data. For sub-regions, where data complexity and influential features are high, a single model might not adequately capture essential information, necessitating the utilization of different models’ strengths by adjusting network structures for training and prediction. In contrast, for large areas with fewer features, complex model structures may not necessarily improve predictive accuracy; instead, they may increase the training time and risk of overfitting.