1. Introduction
In recent years, with the rapid development of the economy and increased reliance on fossil fuels, including coal and oil, the earth’s ecological environment has been seriously threatened [
1]. To alleviate environmental pressures, countries have implemented various policies [
2]. Among these, owing to its non-polluting nature, renewable energy development has emerged as a key environmental governance strategy, gradually replacing traditional fossil fuels as the cornerstone of future energy systems. Promoting new energy generation has also become a critical initiative in global environmental governance. New energy generation encompasses wind power, photovoltaic (PV) power generation, geothermal power generation, and other forms of energy. Harnessing solar power, characterized by its plentiful supply, environmental friendliness, and sustainable qualities, has seen widespread adoption and application across the globe [
3,
4]. DPV power generation has rapidly gained popularity due to its decentralized, clean, and highly efficient nature [
5]. According to statistics from the International Energy Agency, global DPV capacity is projected to reach 917.1 GW in 2023 and 3467.1 GW by 2030. However, PV power generation is characterized by significant intermittency, randomness, and uncertainty. The large-scale integration of DPV systems significantly complicates the operation and control of distribution networks. An accurate forecast of DPV power is crucial for the operation and management of active distribution networks and the effective utilization of DPV energy. It is vital to ensure the safe and stable operation of active distribution networks. Moreover, these forecasts enable PV power sellers to adjust their market strategies promptly, minimizing potential losses. Therefore, an accurate forecast of DPV power plant output is essential.
Ultra-short-term PV power forecasting can be categorized into three main approaches: physical modeling methods, statistical methods, and machine learning methods. The physical modeling method develops a PV power model by applying the principles of solar radiation transmission, energy conversion in PV systems, and associated physical phenomena [
6,
7,
8]. This method utilizes Numerical Weather Prediction (NWP) data along with photovoltaic panel data to achieve highly accurate and interpretable forecasts. Statistical methods, in contrast, rely on historical data and statistical principles to forecast future PV power outputs. This method analyzes historical power output data along with related influencing factors to identify their correlation patterns and forecast future output power accordingly. Frequently used statistical methods encompass the Auto-Regression and Moving Average (ARMA) model, along with its variations [
9,
10,
11], and the Autoregressive Integrated Moving Average (ARIMA) model [
12,
13]. Machine learning approaches mainly include models like Support Vector Machines (SVMs) [
14], Artificial Neural Networks (ANNs) [
15], and Extreme Learning Machines (ELMs) [
16]. Among these, the Back Propagation (BP) neural network has gained significant attention and has been widely applied due to its flexibility and adaptability in handling nonlinear relationships [
17,
18]. Traditional Artificial Neural Networks face challenges in processing large amounts of input data, including issues such as gradient vanishing and explosion. Consequently, deep learning, encompassing methods like Convolutional Neural Networks and Long Short-Term Memory networks, has garnered significant attention from researchers [
19,
20]. These models demonstrate superior capabilities for feature learning and information extraction, achieving notable success in PV power forecasting.
While many methods developed for centralized PV plants are applicable to distributed PV power forecasting, the latter presents unique challenges. DPV plants are typically small-scale and geographically dispersed, located across diverse environments such as cities, villages, and rooftops. This distribution leads to significant spatial and temporal variability in weather, light conditions, temperature, and other environmental factors. Enhancing the accuracy of DPV power forecasting by accounting for spatiotemporal correlations between PV power plants is, therefore, a critical research focus [
21,
22,
23,
24,
25,
26]. Literature reference [
21] utilized correlation information to improve the accuracy of power forecasting and used the power of neighboring PV sites as model inputs to forecast the power of the target station. Literature reference [
22] employed Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks to extract spatial and temporal features, respectively. The temporal and spatial features were then fused using LSTMs to effectively mine the spatiotemporal correlation characteristics of DPV systems. Literature reference [
23] initially linked highly correlated PV power stations to construct a topological graph structure. Based on this graph, an improved spatiotemporal graph network model was constructed to comprehensively explore the spatiotemporal characteristics of regional PV power plants. Literature reference [
24] proposed a short-term regional DPV power generation forecasting method based on partitioning that considers spatiotemporal correlations. Initially, it employs Graph Convolutional Networks (GCNs) to extract spatial correlation features and LSTM networks to capture the evolution characteristics of these dynamic spatial correlations, thereby establishing power forecasting models for power stations under different weather types. Literature reference [
25] applies Dynamic Directed Graph Convolutional Networks (DDGCNs) to ultra-short-term power forecasting for regional DPV systems. To capture the dynamic and directed adjacent relationships between graph nodes, a temporal attention mechanism is introduced and combined with the directed GCN model. This approach allows for considering the dynamic relationships between DPV sites.
However, existing methods for spatiotemporal correlation rely on historical operational data from all sites, achieved through centralized data sharing to construct the model. However, such a centralized data-sharing model significantly increases the risk of data leakage. Spatiotemporal correlation mining typically involves collecting historical data from multiple sites, such as power output, meteorological conditions, and equipment operation status, potentially exposing sensitive operational details of each PV site. If stored or shared centrally without synchronization or robust encryption, these data are vulnerable to privacy breaches. Additionally, distributed PV sites often belong to different electricity sellers, who may refuse to share original PV power data to safeguard their commercial interests.
Federated learning has emerged as an effective approach to address this challenge [
27]. In this framework, the historical operational data of each distributed PV system are processed locally without centralized sharing or exposure to other participants. The learning model is trained locally using historical data, and only the model parameters are shared for global aggregation on a central server [
28,
29]. However, the shared model parameters contain limited information, making it challenging to extract spatial correlation among distributed PV sites. While existing federated learning frameworks effectively safeguard data privacy, they struggle to significantly enhance power forecasting accuracy. Therefore, an important research gap is how to improve the model’s ability to mine and utilize the spatiotemporal correlation information between distributed PV power stations while ensuring data privacy under the existing federated learning framework.
Additionally, the current distributed PV power forecasting predominantly relies on traditional point forecasting methods, which yield a single deterministic value for the output power. However, distributed PV power is influenced by various instability factors, such as weather conditions, introducing significant stochasticity and uncertainty. The probabilistic forecasting of distributed PV enables the effective quantification of forecast uncertainty, offering more comprehensive forecasting information than point forecasting and providing critical guidance for the operation and regulation of active distribution networks [
30,
31]. Depending on the modeling approach, probabilistic forecasting methods can be categorized into parametric and nonparametric methods [
32].
The parametric method presumes that the forecasting probability density function adheres to a known distribution model, such as normal or gamma distributions. It estimates the parameters of this model based on the forecasting errors, which are subsequently utilized for probabilistic forecasting [
33,
34]. The advantage of parametric estimation lies in leveraging the distributional form of existing data for inference, enabling the derivation of more accurate parameter estimates. However, this approach requires strict assumptions about the data’s distribution. When the data deviate from the assumed distribution, significant errors can occur. In practice, PV output errors often fail to conform to such strict distributional assumptions.
In contrast, nonparametric methods do not rely on a priori assumptions but directly model and estimate the shape of the power distribution, offering high flexibility for capturing various forms of data distributions [
35]. Literature reference [
36] proposed constructing a nonparametric PV power probabilistic forecasting model, where independent LSTM deterministic forecasting models were developed using historical PV output data and NWP data. Nonparametric probabilistic forecasting was achieved through quantile regression averaging (QRA). Similarly, literature reference [
37] effectively conducted the nonparametric probabilistic forecasting of power generation by employing a direct quantile regression method that integrates Extreme Learning Machines and quantile regression, demonstrating strong forecasting performance. Leveraging the advantages of nonparametric methods, this study recognizes that PV power fluctuations strongly influence forecasting errors: violent fluctuations tend to yield larger errors, while gentler fluctuations correspond to smaller errors. Consequently, forecasting errors do not necessarily follow an independent and identically distributed assumption [
38], as their distribution law strongly correlates with the characteristics of PV power fluctuations. However, existing nonparametric probabilistic forecasting methods predominantly focus on improving forecasting models while often overlooking data dependency relationships, resulting in insufficient refinement of the error distribution characteristics.
In summary, existing DPV cluster power generation forecasts methods have three significant research gaps. Firstly, they face severe data security issues when mining and utilizing cluster-related information. Secondly, there is a lack of effective means for mining and utilizing associated features under the framework of federated learning. Thirdly, they seldom give probability forecasting results to quantify the uncertainty, and most of them ignore the correlation between forecasting results and forecasting errors.
To address these issues, this paper proposes a privacy-preserving deep federated learning method for probabilistic forecasting of ultra-short-term power generation from DPV systems and designs a collaborative feature federated learning framework. For the central server, it aggregates global models based on local model parameters and installed capacity uploaded by each station. At the same time, it gathers temporal features from various clients to generate global spatial features. For local client sites, a Transformer-based autoencoder is used to extract local time-series features. The encoder couples local time series features with global spatial features to generate spatiotemporal correlation features, which are adaptively weighted through an attention mechanism to focus the model on more critical features for forecasting. Based on point forecasting results, a joint probability distribution of forecasting values and errors is constructed, and the upper and lower bounds of errors are determined through inverse transformation, thereby obtaining the interval for probabilistic forecasting. Thus, under the premise of ensuring privacy, the dual-layer information interaction mechanism of model interaction and feature sharing is utilized to construct spatiotemporal correlation features that combine local client forecasting models, fully considering the spatiotemporal correlation relationships between different power stations to improve the forecasting accuracy of local models. By considering the dependencies between data, the distribution patterns of errors under different forecasting outcomes are refined, thus enhancing the accuracy of probabilistic forecasting.
The main contributions of this paper are as follows:
(1) In existing federated learning methods, information exchange between PV sites during the aggregation phase at the central server mostly relies on the aggregation of model parameters. This approach carries a limited amount of information, leading to insufficient shared information and failing to improve forecasting accuracy. To address the issue of low forecasting accuracy caused by inadequate information sharing in traditional federated learning methods, this paper introduces the construction of global features to fully compensate for this deficiency. By aggregating local time-series features from each site through the central server, the volume of information exchanged between sites is increased. Under the premise of protecting data privacy, this method thoroughly considers the correlations between different sites, thereby improving power forecasting accuracy;
(2) When performing power forecasting at local stations, existing federated learning methods predominantly rely on the global model distributed by the central server, failing to account for the unique time-series characteristics of local sites. To improve this situation, this paper proposes that when conducting power forecasting at local PV sites, the Transformer model should be used to extract key time-series features specific to the local site. These extracted features are then combined with the global features issued by the central server to form spatiotemporal correlation features. On the basis of uncovering spatiotemporal correlations among sites, this approach fully considers the unique time-series characteristics of each site, further enhancing forecasting accuracy;
(3) Traditional probabilistic forecasting methods often focus solely on improving model performance while frequently overlooking the uncertainty in forecasting and lacking studies on the correlation between the distribution of forecasting errors and the fluctuation characteristics of PV output. To address this gap, this paper proposes constructing a joint probability distribution between these two aspects, modeling entirely based on the dependent relationship between forecasting values and forecasting errors. This approach fully considers the correlations within the data. Through comparisons with existing probabilistic forecasting methods, the superiority of the proposed joint distribution model has been verified.
Compared to previous recently published articles, the main differences of this paper are as shown in
Table 1.
3. Results and Discussion
3.1. Dataset Description
The experimental data originate from the power generation data of 30 DPV stations located in a certain area in northern China. The data span from 1 July 2023 to 1 July 2024, with a time resolution of 15 min. To ensure data quality and reduce the impact of outliers on power forecasting performance, the original data underwent cleaning and repair. Specifically, abnormally large values exceeding capacity, abnormally small values such as negative outputs, and missing power values were corrected. Data repair was achieved by replacing the abnormal points with the average power values of the three normal points before and after them. Subsequently, the cleaned dataset was divided into training, validation, and test sets at a ratio of 8:1:1, respectively used for deterministic point forecasting training, constructing a joint distribution model using validation set data, and testing the effectiveness of the proposed method with test set data. During the forecasting process, all power values less than zero were set to zero.
3.2. Evaluation Metrics
During the point forecasting phase, this paper adopts the Normalized Root Mean Square Error (NRMSE) and Accuracy (ACC) as evaluation metrics, as shown in Equations (28) and (29). NRMSE considers the relative error between forecasting values and actual values, while ACC evaluates the model’s performance over the entire forecasting period by comprehensively considering the 16-step results of forecasting, thus avoiding the limitation of assessing model performance based solely on single-point forecasting results.
where
represents the maximum power,
T indicates the length of the forecasting time,
and
are the forecasted and true values at time
t, and
and
are the forecasted and true values at the
ith step of forecasting at time
t, with
n being the number of forecasting points evaluated for that day.
During the probabilistic forecasting phase, the Prediction Interval Coverage Probability (PICP), Prediction Interval Normalized Average Width (PINAW), and Skill Score (SS) are adopted as evaluation metrics, as shown in Equations (30)–(32). PICP represents the percentage of actual values that fall within the forecasting intervals, reflecting the accuracy of the model. PINAW measures the width of the forecasting intervals, indicating the sensitivity of the model. SS provides a comprehensive evaluation, rewarding narrower intervals but penalizing if the target value falls outside the forecasting interval. The value of this metric is non-positive, and, the closer it is to 0, the better the probabilistic forecasting performance.
where
N represents the sample size.
p1i and
p2i represent the upper and lower boundaries of the forecasted samples.
3.3. Experimental Setup
To demonstrate the effectiveness of the proposed method in this paper, four additional models are set up as control models in addition to the method proposed in this paper:
Proposed method: A joint distribution probabilistic forecasting model based on collaborative feature federated learning and a Transformer autoencoder;
Method 1: A joint distribution probabilistic forecasting model based on a traditional federated learning framework and a Transformer autoencoder model;
Method 2: A joint distribution probabilistic forecasting model using a Transformer autoencoder built solely on local data for each station;
Method 3: A joint distribution probabilistic forecasting model using a GRU built solely on local data for each station;
Method 4: A traditional probabilistic forecasting model based on collaborative feature federated learning and a Transformer autoencoder.
In the settings of these methods, Method 1 involves only the exchange of model parameter information without aggregating global features. Methods 2 and 3, while protecting the data privacy of clients, sacrifice inter-station information sharing and do not fully utilize the spatiotemporal correlations between stations. Method 4 considers only the distribution patterns of forecasting errors without accounting for the dependencies between forecasting values and forecasting errors. By comparing the proposed method with Methods 1 and 2, we verify that the proposed method achieves more accurate forecasting results while ensuring data privacy. Comparing Method 2 with Method 3 verifies the forecast performance of the local client Transformer autoencoder model. Comparing the proposed method with Method 4 validates the accuracy of probabilistic forecasting based on joint probability distribution. The input to each forecasting model is the power from the previous 96 time steps of the forecast period, and the output is the power for the next 16 time steps, achieving ultra-short-term power forecasting.
In the point forecasting phase, this paper employs grid search to optimize the selection of hyperparameters (learning rate, batch size, number of iterations) for the forecasting model. Initially, based on the scale of the experimental data and the complexity of the model, three different values are set for each hyperparameter. Then, by means of a grid search, the method seeks the optimal combination of hyperparameters within the predefined hyperparameter space. Among all combinations of hyperparameters, the one that yields the best performance on the validation set is selected as the final hyperparameter configuration. Through comparison, this paper selects a learning rate of 0.001, a batch size of 128, and 100 epochs for the Transformer point forecasting model. For the GRU point forecasting model, a learning rate of 0.001, a batch size of 128, and 150 epochs are chosen.
3.4. Experimental Results Analysis
In the point forecasting phase, the proposed method and Methods 1, 2, and 3 are used to conduct ultra-short-term power forecasting for each client station. Since Method 4 is identical to the proposed method in the point forecasting phase, no repeated verification is performed for Method 4. Under a time scale of 1 h ahead to 4 h ahead, the average NRMSE metric values for each forecasting method across all stations are statistically analyzed, as shown in
Table 2. The average ACC metric values for each forecasting method across all stations are presented in
Table 3. It can be seen that the proposed method demonstrates superiority in both evaluation metrics. Compared with Methods 1 and 2, the proposed method improves the NRMSE by 0.55% and 0.41%, respectively, at the 1 h ahead time scale, and by 1.17% and 2.59%, respectively, at the 4 h ahead time scale. The ACC is improved by 1.61% and 1.74%, respectively, indicating that the collaborative feature federated learning method proposed in this paper, compared to traditional parameter-aggregated federated learning, effectively mines the spatiotemporal correlations between different client stations by introducing their temporal features and completing a dual-layer aggregation of information with the global model, thereby enhancing forecasting accuracy while ensuring the privacy of individual station data. Method 1 shows improved accuracy over Method 2 by leveraging spatiotemporal correlation information to some extent through the aggregation of model parameters: though its forecasting results at the 1 h ahead time scale are not as good as those of Method 2, its overall forecasting performance, evaluated by the 16-step results, is better. By comparing Methods 2 and 3, the autoencoder built on the Transformer model outperforms the GRU neural network by 2.45% and 2.14% in the NRMSE metric, and by 1.42% in the ACC metric, at the 1 h ahead and 4 h ahead time scales, respectively. This demonstrates that the proposed autoencoder can effectively learn long-term dependencies within local station data and focus on more critical temporal features through the multi-head attention mechanism, acquiring more important information during training and thus achieving more accurate forecasting results compared to traditional neural networks. Furthermore, by calculating the decrease in the NRMSE metric for each method at the 1 h ahead and 4 h ahead time scales, the proposed method shows a reduction of only 2.21%, while other methods show reductions ranging from 2.83% to 4.39%. This indicates that the proposed method performs more stably in ultra-short-term 16-step forecasting, with the forecasting model better capturing trends in future data, resulting in superior forecasting performance.
Taking Station No. 15 as an example, the forecasting results and forecasting errors of the four methods are shown in
Figure 3. From the figures, it can be observed that the curve generated by the proposed method closely aligns with the actual values, and the overall error is relatively low. Furthermore, it is evident that the distribution characteristics of forecasting errors vary under different forecasting output results and fluctuation patterns. For example, as shown in the figure, the first day is sunny, with relatively smooth power output fluctuations, and the amplitude of forecasting errors is relatively small. The second day is cloudy, with more intense fluctuations in PV power output, leading to similarly intense fluctuations in error. The third day is overcast, with lower forecasting power values and consequently smaller error values. This indicates that there is a relationship between the distribution patterns of forecasting errors and the characteristics of power output fluctuations. Moreover, the distribution patterns of forecasting errors differ from those of forecasting values under different weather conditions. This insight also provides a basis for constructing joint probability distributions in subsequent probabilistic forecasting research.
Based on the above discussion, to further demonstrate the effectiveness of the proposed method in this paper, the average values of forecasting evaluation metrics for each station under a 4 h ahead time scale were statistically analyzed for three typical weather conditions, as shown in
Table 4. Through comparison, it is evident that the proposed method in this paper achieves good forecasting results in all typical weather scenarios. Under sunny conditions, the proposed method in this paper achieved an NRMSE of 93.57%, which is 1.15%, 2.61%, and 1.56% higher than Method 1, Method 2, and Method 3, respectively. In terms of the ACC metric, our method reached 94.88%, outperforming the other methods by 1.76%, 2.86%, and 1.71%, respectively. This indicates that the proposed method offers higher forecasting accuracy and stability under sunny conditions. The analysis shows that, due to the relatively regular changes in irradiance and stable power output from DPV systems under clear skies, all methods achieve good forecasting results, with our proposed method being the most effective. In overcast conditions, the proposed method’s NRMSE was up to 4.01% better than other methods, and its ACC was 3.93% higher. Under cloudy conditions, the proposed method outperformed others by up to 3.12% and 2.7% in the two evaluation metrics, respectively. Through comparative analysis, it is evident that the rapid movement and changes of cloud cover in overcast and cloudy weather lead to highly irregular patterns of solar irradiance. Solar irradiance can drop sharply from high levels to low within a few minutes, causing significant fluctuations in PV system output power over short periods. This unstable irradiance condition makes it difficult for traditional forecasting models to accurately capture the trends of PV power output, leading to decreased forecasting accuracy, especially on short time scales (such as very short-term forecasting). The proposed method addresses these challenges by considering the spatiotemporal correlations between this station and nearby stations, obtaining more information about irradiance and power output changes. Stations at different geographic locations may experience similar weather patterns but with certain lags or advances in timing. By sharing this information, we can gain a more comprehensive understanding of current and future irradiance trends, thereby improving forecasting accuracy. Furthermore, the application of the multi-head attention mechanism enhances the model’s ability to focus on key feature changes, effectively capturing critical information essential for accurate forecasting. This helps mitigate the increased uncertainty brought by overcast and cloudy conditions, resulting in superior forecasting performance compared to other methods.
Based on the aforementioned analysis, the proposed method demonstrates superior point forecasting performance. On one hand, the Transformer-based autoencoder model constructed in this paper possesses superior time series modeling capabilities, effectively enhancing the accuracy of power forecasting by focusing on key temporal features. On the other hand, the proposed method recognizes that information sharing among neighboring DPV stations is an effective way to improve forecasting accuracy, achieving a deeper exploration of spatiotemporal correlations through the dual-layer information interaction of global models and global temporal features. Consequently, under the premise of protecting the data privacy of local clients, the proposed method fully considers the relationships between stations, thereby improving the accuracy of point forecasting.
To further demonstrate the effectiveness of the proposed method, the ACC accuracy of different forecasting methods across all client stations was statistically analyzed, as shown in
Figure 4. It is evident that the proposed method achieves the best forecasting performance in the majority of stations, indicating that, by constructing a collaborative feature federated learning framework, by establishing a more comprehensive information sharing mechanism between the central server and local clients, and by building a Transformer model for local temporal feature extraction, the proposed method not only effectively mines the spatiotemporal correlations between different client stations but also fully considers the personalized features of local sites, thereby achieving higher forecasting accuracy. Method 1, compared to Method 2, considers more of the spatiotemporal correlations between stations, and, despite sharing less information, still manages to improve forecasting accuracy. Method 2, compared to Method 3, considers the temporal characteristics of the power station itself, enabling the model to better focus on the fluctuation patterns of local power data. It should be noted that the proposed method does not yield the best forecasting results for Stations 13, 18, 25, and 26. This might be due to weaker power output correlations between these stations and others, which not only makes it difficult for the global model to effectively capture specific features of these stations during training, but also introduces a significant amount of redundant information into the local forecasting model, reducing its forecasting accuracy. However, overall, the proposed method still achieves the best forecasting results, followed by Method 1, Method 2, and Method 3. This also verifies the advantages of introducing global feature aggregation and key spatiotemporal correlation feature generation in the collaborative feature federated learning during the forecasting process.
In the probabilistic forecasting phase, the proposed method and the four other methods were used to conduct ultra-short-term probabilistic forecasting for each power station. In this paper, the Clayton Copula function was selected to fit the cumulative probability distribution of deterministic point forecasting results and forecasting errors, with the primary parameter θ of the Copula function being estimated using maximum likelihood estimation.
Taking Station No. 15 as an example again, the accuracy comparisons of various probabilistic forecasting methods at different confidence levels are presented in
Table 5. By comparing the experimental results of different methods, it can be seen that, under three different confidence levels, the proposed method exhibits the best SS, effectively narrowing the forecasting intervals according to the distribution patterns of forecasting errors while maintaining high coverage, thus achieving a balance between reliability and sensitivity in probabilistic forecasting.
Comparing the proposed method with Methods 1, 2, and 3 reveals that, based on achieving high-precision point forecasting, the proposed method also realizes high-precision probabilistic forecasting. Under the three confidence levels, the SS of the proposed method is improved by 30.01%, 29.54%, and 30.81%, respectively. Moreover, combining the results of point forecasting, it is found that higher point forecasting accuracy also leads to higher probabilistic forecasting accuracy. This indicates that the accuracy of point forecasting plays a crucial role in the precision of probabilistic forecasting. High-precision point forecasting can more accurately reflect actual values, narrow the range of error distribution, stabilize forecasting errors, and reduce the occurrence of extreme values and sharp fluctuations, aiding probabilistic forecasting models in more accurately estimating uncertainties, thereby achieving highly accurate probabilistic forecasting. Comparing the proposed method with Method 4, it is evident that probabilistic forecasting based on joint probability distribution yield better results than those considering only error distribution. Under the three confidence levels, although the proposed method reduces the PICP by 2.42%, 2.97%, and 3.01%, respectively, it effectively narrows the PINAW by 27.14%, 24.51%, and 22.33%, respectively, thereby improving the SS by 22.27%, 21.29%, and 23.80%, respectively. This suggests that joint probability distribution, by simultaneously considering the information of forecasting values and errors, captures the intrinsic structure and mutual relations of data more comprehensively. By exploring the dependency structure between forecasting values and errors, it refines the study of error distribution patterns, achieving higher-precision probabilistic forecasting.
At a 90% confidence level, the probabilistic forecasting results of each method are shown in
Figure 5. Method 4 only considers the distribution characteristics of errors, so the width of its forecasting interval is entirely determined by the quantiles of error boundaries and is unaffected by changes in the forecasting values. This approach assumes that the error distribution remains constant, leading to a forecasting interval width that is a fixed constant value. While this fixed interval width is simple and intuitive, it overlooks the dynamic nature of actual power output over time. Consequently, Method 4’s probabilistic forecasting lacks sensitivity to real-time power fluctuations and fails to effectively capture rapid changes in the short term, resulting in poorer predictive acuity. In contrast, the method proposed in this paper, which is based on joint probability distribution modeling, exhibits significant advantages. Our proposed method not only considers the distribution of errors but also incorporates the patterns of change in point forecasting results. This allows the width of the forecasting intervals to dynamically adjust according to actual conditions, closely aligning with the true power values. This characteristic enables the model to better reflect changes in the distribution of forecasting errors caused by differences in power fluctuation characteristics. By introducing Copula functions, our method can flexibly describe the complex dependency relationships between PV output and forecasting errors.
In summary, the proposed method achieves dynamic adjustment of forecasting intervals through joint probability distribution modeling. This ensures both the accuracy of probabilistic forecasting and significantly enhances sensitivity to real-time power fluctuations.
4. Conclusions
To provide more comprehensive reference information for power system dispatch, this paper proposes a privacy-preserving deep federated learning method for the probabilistic forecasting of ultra-short-term power generation from DPV systems. This method achieves the enhancement of power forecasting accuracy for target stations by mining the correlation information of power output between stations while protecting the data privacy of each client. In the point forecasting phase, a collaborative feature federated learning method is proposed. Initially, the central server aggregates model parameters from each client and generates global features, which are then distributed to each station to facilitate the interaction and sharing of global and local information. Subsequently, a Transformer-based autoencoder model is established for each client to deeply mine the local temporal features of the station. This dual-layer information interaction in federated learning effectively enhances the accuracy of point forecasting, providing reliable data support for probabilistic forecasting. In the probabilistic forecasting phase, the Copula function is used to construct a joint probability distribution model between forecasting values and forecasting errors, allowing for a detailed study of error distribution patterns and obtaining more adaptive forecasting intervals to improve the sensitivity and accuracy of probabilistic forecasting.
For future research, the following improvements can be made:
(1) In this paper, the aggregation of global features is introduced during the traditional federated learning process, but no improvements have been made to the aggregation of model parameters. Therefore, future work will further explore global model aggregation methods that are more suitable for the characteristics of DPV power generation, to enhance the performance of federated learning;
(2) When constructing the joint probability distribution model in this paper, only the Clayton Copula function is used for fitting. Thus, future work will continue to investigate other types of fitting functions to find those that better fit PV data, further improving the performance of probabilistic forecasting;
(3) Future research will consider introducing a weather anomaly detection mechanism to identify extreme weather events and assess the model’s performance under different weather anomalies and grid instability conditions.