A Long Short-Term Memory–Wasserstein Generative Adversarial Network-Based Data Imputation Method for Photovoltaic Power Output Prediction

Liu, Zhu; Xuan, Lingfeng; Gong, Dehuang; Xie, Xinlin; Zhou, Dongguo

doi:10.3390/en18020399

Open AccessArticle

A Long Short-Term Memory–Wasserstein Generative Adversarial Network-Based Data Imputation Method for Photovoltaic Power Output Prediction

by

Zhu Liu

¹,

Lingfeng Xuan

²,

Dehuang Gong

²,

Xinlin Xie

² and

Dongguo Zhou

^3,*

¹

China Southern Power Grid Research Technology Co., Ltd., Guangzhou 510663, China

²

Qingyuan Yingde Power Supply Bureau, Guangdong Electric Power Co., Ltd., Yingde 513000, China

³

School of Electrical Engineering and Automation, Wuhan University, Wuhan 430072, China

^*

Author to whom correspondence should be addressed.

Energies 2025, 18(2), 399; https://doi.org/10.3390/en18020399

Submission received: 27 December 2024 / Revised: 12 January 2025 / Accepted: 15 January 2025 / Published: 17 January 2025

(This article belongs to the Special Issue Forecasting of Photovoltaic Power Generation and Model Optimization)

Download

Browse Figures

Versions Notes

Abstract

:

To address the challenges of the issue of inaccurate prediction results due to missing data in PV power records, a photovoltaic power data imputation method based on a Wasserstein Generative Adversarial Network (WGAN) and Long Short-Term Memory (LSTM) network is proposed. This method introduces a data-driven GAN framework with quasi-convex characteristics to ensure the smoothness of the imputed data with the existing data and employs a gradient penalty mechanism and a single-batch multi-iteration strategy for stable training. Finally, through frequency domain analysis, t-Distributed Stochastic Neighbor Embedding (t-SNE) metrics, and prediction performance validation of the generated data, the proposed method can improve the continuity and reliability of data in photovoltaic prediction tasks.

Keywords:

PV output prediction; data imputation; GAN; LSTM; gradient penalty mechanism

1. Introduction

The rapid growth in energy demand and the increasing climate change have promoted the widespread application of renewable energy [1]. However, the discontinuity and fluctuation of photovoltaic power generation presents severe challenges to the stability, reliability, and operational efficiency of the smart grid [2]. To navigate these difficulties, the precision of photovoltaic power-generation forecasting is of paramount importance, as it enhances the operational efficiency of power grids and mitigates the costs associated with balancing supply and demand [3]. Moreover, accurate forecasting is instrumental in advancing the integration of photovoltaic power generation with state-of-the-art technologies, including energy storage solutions and smart grid systems [4].

Generally speaking, the field of photovoltaic power-generation forecasting encompasses a diverse array of methodologies, including physical modeling, conventional machine learning, and deep learning approaches, with a predominant focus on model design and optimization [5,6,7,8]. These techniques are fundamentally data-dependent, and their efficacy hinges on data quality. However, photovoltaic systems periodically experience downtime for maintenance and other reasons, resulting in gaps in data records [9]. The resultant data incompleteness severely disrupts the continuity of time-series signals, compromising the accuracy of prediction models, particularly in short-term forecasts [1].

To tackle intermittent data gaps, Brooks M.J. [10] and Layanun V. [11] propose linear and cubic spline interpolation methods, yet these prove inadequate in the face of abrupt weather shifts [12], and their precision wanes significantly when confronted with extensive, continuous missing data of photovoltaic output [13]. References [14,15] employ regression trees for data imputation, assuming linear or predefined relationships between variables. However, these methods may fall short in accurately capturing the data’s intrinsic patterns when faced with sparsity, dramatic fluctuations, or nonlinear dependencies, leading to suboptimal or failed imputation outcomes.

Additionally, supervised learning models such as Multi-Layer Perceptions (MLPs) [16], Super-Resolution Perception Convolutional Neural Networks (SRPCNNs) [17], and Long Short-Term Memory networks (LSTMs) [12,18] have emerged as the norm in contemporary data imputation technologies. In an effort to mitigate the assumption that missing data share the same distribution or features as observed data, semi-supervised learning-based imputation methods like SolarGAN [19], CC-GAIN [13], and CM-GAN [20] have also gained widespread application.

CC-GAIN and SolarGAN are both strong approaches. The generators in both CC-GAIN and SolarGAN have equal input and output sequence lengths, primarily designed for imputing missing values within a portion of the input sequence. When the length of the missing data exceeds the predefined input sequence length, these models fail to perform effectively. CM-GAN focuses on the relationships between cross-domain data. However, it overlooks the representation of sequential dependencies in time-series data.

To mitigate these challenges, this work introduces a photovoltaic power data imputation approach that leverages the Wasserstein Generative Adversarial Network (WGAN) in conjunction with LSTM networks. The integration of a gradient penalty mechanism, along with a single-batch multi-iteration training strategy, ensures the stability of the model during training. The key contributions of this work are as follows:

(1): Designing an LSTM-WGAN framework, which is developed by the WGAN framework. This network is designed with data-driven architecture with quasi-convex properties, specifically tailored to address the complexities of photovoltaic power data imputation.
(2): Utilizing real-world photovoltaic-system data for testing. The efficacy of the generated data is demonstrated through a trilogy of validation approaches: frequency domain assessment, t-SNE visualization, and a comparative analysis of forecasting efficacy.

The organization of this paper is as follows: Section 2 provides an in-depth description of the photovoltaic power-generation data imputation method based on LSTM-WGAN. In Section 3, the experimental results are presented, and the performance of the proposed method is tested on the real-world dataset to demonstrate its effectiveness. Finally, conclusions are drawn in Section 4.

2. LSTM-WGAN Method

In the imputation task, capturing the essence of the sequence’s characteristics takes precedence over the precise replication of values. In such scenarios, a GAN network emerges as the ideal candidate. Although there are several GAN variants, such as Conditional GAN (CGAN) and CycleGAN, most of them are not suitable for this task. For example, CGAN requires labeled data, which are not at our disposal. Meanwhile, CycleGAN, with its dual generator and discriminator architecture, is crafted for domain-to-domain translations, thus rendering it incompatible with our data imputation needs. In contrast, WGAN excels in generating data that closely adheres to the real data distribution. Consequently, WGAN is chosen for data imputation, due to its superior training stability, reliable convergence, and ability to generate high-quality data through the gradient penalty mechanism.

A conventional GAN network consists of two main components: the generator and the discriminator. The generator generates simulated data through input noise. Nevertheless, when applying standard GANs to the task of imputing missing data in photovoltaic power time-series, ensuring continuity at the edges of the imputed data can be quite challenging. To address this issue, our objective is to integrate historical data into the generation process, thereby infusing temporal patterns and imposing smoothness constraints to enhance the quality of the generated data.

Considering the temporal dynamics inherent in photovoltaic power data, the LSTM network was selected for our approach. Although other Recurrent Neural Networks (RNNs) such as Gated Recurrent Units (GRUs) are capable of capturing sequential dependencies, LSTM was favored, due to its extensive adoption and proven track record. It offers the optimal balance between performance and model complexity for the particular requirements of our application.

Thus, by integrating the strengths of both WGAN and LSTM, we achieve a stable and dependable data imputation process that adheres to the sequential patterns intrinsic to photovoltaic time-series data. This hybrid approach is particularly tailored to r the task we aim to accomplish. The complete system architecture is depicted in Figure 1, and a thorough explanation is provided in the subsequent sections.

2.1. Generator

Let g_out represent a vector comprising k photovoltaic power generation data points starting at time t₀, and q be a vector consisting of m consecutive photovoltaic power generation values that are known before t₀. The feature extraction model is constructed as follows:

γ_{1} = LSTM (q)

(1)

where LSTM denotes the Long Short-Term Memory network [21], and its fundamental building block is depicted as follows:

γ_{i} = σ (W_{o} [h_{i - 1}, q_{i}] + b_{o}) * \tanh (C_{i})

(2)

\begin{array}{l} C_{i} = & σ (W_{f} [h_{i - 1}, q_{i}] + b_{f}) * C_{i - 1} + \\ σ (W_{b} [h_{i - 1}, q_{i}] + b_{b}) * {\tilde{C}}_{i - 1} \end{array}

(3)

{\tilde{C}}_{i - 1} = \tanh (W_{c} [h_{i - 1}, q_{i}] + b_{c})

(4)

where h_i₋₁ represents the output from the previous level, and q_i represents the input at the current level; σ and tanh denote the activation functions; and W_o, W_f, W_b, W_c, b_o, b_f, b_b, b_c are the parameters of network model.

Given that photovoltaic power generation is subject to a multitude of stochastic elements, including weather conditions, temperature fluctuations, cloud cover, and more, these influences typically emerge as noise or random variations. Consequently, the global feature representation of photovoltaic power-generation sequences encapsulates overarching patterns associated with randomness or noise, and is built as

γ_{2} = σ (F C 2 (σ (F C 1 (Noise))))

(5)

where FC1 and FC2 both represent fully connected layers. Subsequently, by concatenating the feature vector

γ_{1}

from Equation (1) with the global feature vector

γ_{2}

, the comprehensive feature can be expressed as follows:

γ = c o n c a t (γ_{1}, γ_{2}) .

(6)

The final generated vector for photovoltaic power generation, denoted as g_out, is formulated as follows:

g_{o u t} = σ (F C 4 (σ (F C 3 (γ))))

(7)

where FC3 and FC4 represent fully connected layers.

To facilitate training, we selected activation functions with quasi-concave or quasi-convex properties, such as ReLU and Rectified Exponential Unit (REU) [1]. These functions were tested to evaluate their effects on training stability and convergence speed. Leaky ReLU was ultimately chosen for our model, due to its outstanding performance.

Notably, to prevent the generator from simply shortening the input sequence to produce the output sequence, it is crucial to impose the constraint m < k. This ensures that the generator is forced to learn meaningful correlations and patterns within the existing data, which allows it to accurately predict missing values. By repeating this data generation process, the model can produce sequences of arbitrary length. This approach, therefore, enables the model to handle missing data at both short and long intervals in a consistent manner.

To prevent overflow due to large data values, Min–Max Scaling was employed to normalize the input data before feeding it into the model. Once the model generates the output, the reverse normalization was performed to guarantee that the normalization process does not affect the results.

2.2. Discriminator

The primary task of the discriminator is to distinguish between real and generated data, which is a task that falls under binary classification. Its role is to evaluate the authenticity of the input data and provide critical feedback to refine the generator’s performance throughout the training process. The discriminator’s architecture is typically composed of convolutional layers, followed by fully connected layers, and culminates in a sigmoid activation function for binary classification purposes. The underlying formula for this process is as follows:

X_{o u t} = s i g m o i d (F C (LReLU (c o n v (X_{i n})))) .

(8)

where X_in and X_out denote the input and output of the discriminator, respectively; LReLU(.) denotes the Leaky ReLU non-linear activation function; FC stands for the fully connected layer; and conv(.) refers to the convolutional layer, which is employed to extract features from the input data. Lastly, the output is transformed to the range [0, 1] via the sigmoid function, which offers a probabilistic indication of whether the input data are actual or generated.

2.3. Model Training

2.3.1. Model Training

In this proposed model, the training procedure entails the concurrent optimization of two key components: the generator G and the discriminator D. The objective of the generator is to fabricate data of such high fidelity that they elude detection by the discriminator, whereas the discriminator is responsible for precisely differentiating actual and generated data. Customarily, this adversarial framework is formulated as a minimax optimization problem, where the design of the loss function is critical for ensuring convergence.

Unfortunately, ordinary GANs are subject to training instability. Accordingly, the Wasserstein GAN (WGAN) with Gradient Penalty is employed to stabilize training and mitigate overfitting. Gradient penalty is a regularization technique, which encourages the discriminator to remain Lipschitz continuous without relying on weight clipping.

The loss functions for both the generator and discriminator are delineated as

L_{G} = E_{\tilde{z} ~ P_{g}} [D (\tilde{z})] .

(9)

L_{D} = L_{G} - E_{z ~ P_{r}} [D (z)] + λ E_{\hat{z} ~ P_{\hat{z}}} [{({‖\nabla_{\hat{z}} D (\hat{z})‖}_{2} - 1)}^{2}] .

(10)

where E denotes the expected value; P_r and P_g represent the distributions of real and generated data, respectively. λ and ε denote the regularization coefficient and a random value from the interval [0, 1], respectively;

z, \tilde{z}

and

\hat{z}

represent the real data, generated data, and interpolated data, respectively, where the interpolated data can be expressed as

\hat{z} = ε z + (1 - ε) \tilde{z} .

(11)

The discriminator, acting as a binary classifier, produces an output of 0 and 1, signifying its assessment of whether the input is generated data or real data, respectively. The objective of the generator is to deceive the discriminator by maximizing L_G, thereby generating data that closely mimic the distribution of real data. Concurrently, the discriminator sharpens its discriminative capabilities by minimizing L_D, thereby improving its ability to differentiate between actual and generated data.

2.3.2. Hyperparameter Determination

The regularization penalty λ is a critical parameter in the gradient penalty mechanism of WGAN. It controls the trade-off between enforcing the Lipschitz continuity condition and maintaining model stability. To determine λ, we referred to commonly used values (e.g., 10) in the WGAN literature as a starting point. We then conducted a grid search over a range of values (5, 10, 20). Through experimentation, we found that λ = 10 offered the best balance between stable training and high-quality data generation for our specific task.

The optimal number of iterations for the generator and discriminator is a key question. GANs can suffer from instability if the generator and discriminator are not balanced in terms of updates. If one component is updated too frequently, it can become too powerful, making it difficult for the other to learn.

To address this issue, we initially performed a series of training runs with varying numbers of iterations (1, 2, 5, and 10 iterations for each component). During these runs, we monitored the loss values for both, observing whether there were sharp fluctuations or divergence. By experimenting with different iteration counts, we found that updating the generator 5 times for each batch and the discriminator 2 times provided stable training and reliable convergence.

The batch size determines the number of samples used in each training iteration. It directly affects the training dynamics, including convergence speed and stability. A smaller batch size provides more frequent updates but can lead to noisy gradients, while a larger batch size stabilizes training but may require more computational resources. We started with a series of batch sizes (32, 64, 128, and 256) and tested various configurations. Through these experiments, we found that a batch size of 128 provided an optimal trade-off, ensuring stable training and efficient use of computational resources.

2.3.3. Overfitting Problem

Unlike supervised models, which have explicit feedback in the form of labels, GANs only receive feedback based on whether the generated samples are realistic (via the discriminator). This less explicit feedback increases the risk of the model focusing too much on irrelevant aspects of the data. As a result, GANs are generally more prone to overfitting compared to supervised learning models.

Overfitting in GANs typically manifests in the following ways: generator overfitting, and discriminator overfitting. In the first case, the generator might learn to produce outputs that are too similar to the training data, resulting in a lack of diversity in the generated samples. In the last case, the discriminator is too powerful or trains for too many iterations without sufficient regularization, and it may become too sensitive to small fluctuations in the data, thereby providing poor feedback to the generator.

To address potential overfitting issues, we implemented several strategies during the model training process: (1) we applied gradient penalty regularization in the WGAN framework to stabilize training and mitigate overfitting; and (2) we randomly deactivated a fraction of neurons during training, to encourage the model to generalize better and avoid overfitting.

3. Experiments and Results

All experiments in this work were conducted on a computer equipped with an NVIDIA GeForce RTX 4070 graphics card (sourced from Santa Clara, CA, USA), powered by an Intel i9-14900HX processor (sourced from Santa Clara, CA, USA), and supplemented with 64 GB of Samsung RAM (sourced from Seoul, South Korea). Note that to train the model with limited resources, fewer LSTM layers or fewer units per layer can be used to reduce computational load and memory usage. However, this may come at the cost of reduced model performance.

3.1. Data Analysis

Figure 2 shows photovoltaic power generation data from the Desert Knowledge Australia Solar Centre in 2023. Due to regular shutdowns for equipment maintenance or component aging, as well as forced stoppages when electricity demand is insufficient, the photovoltaic power-generation system often experiences intermittent operation, resulting in discontinuous generation records in the photovoltaic data.

To visually analyze the characteristics of the photovoltaic power-generation data sequence, Discrete Cosine Transform (DCT) analysis was employed. DCT is a technique used to transform a time-domain signal into the frequency domain. The result consists of frequency components that represent various characteristics of the signal. In the context of time-series data such as photovoltaic power generation, DCT helps break down the signal into a series of frequency components, each of which carries specific information about the data.

Low-frequency components in the DCT represent the overall trends or slow changes in the data. In time-series data, these correspond to the smooth, large-scale patterns or longer-term behaviors. For example, in the context of photovoltaic data, low frequencies might correspond to overall seasonal variations or long-term power generation trends.

High-frequency components correspond to rapid fluctuations or short-term variations in the data. These are typically associated with the fine details, such as daily changes in power generation, noisy variations, or transient events.

If the imputed data results in significantly higher or lower amplitudes in the low-frequency range, it suggests that the imputation method has altered the global structure of the data. A significant increase in low-frequency amplitude could mean that the imputation method has introduced artificial or over-smooth trends, while a decrease might suggest that the imputation has removed important long-term patterns.

Relative to the low-frequency components, high-frequency components are difficult to visualize. Therefore, only a subset of the low-frequency components is shown in Figure 3. Figure 3b,c illustrates the first five low-frequency components of the DCT following the zero-padding and 200-value filling methods for missing data, respectively. It is evident that there are distinctions in amplitude.

When the imputed data deviate from the true data trends, this leads to erroneous input that misguides data-driven models into learning incorrect relationships. This, ultimately, results in inaccurate predictions and flawed conclusions, which can significantly impact operational decisions, causing potentially increasing costs. Consequently, accurate data imputation is vital to avoid changes in amplitude of frequency components, thereby ensuring the overall integrity and accuracy of the dataset.

3.2. Validation Analysis

Figure 4 illustrates the evolution of loss values throughout the training process of the proposed network. In the experiment, the batch-optimization parameters for the generator and discriminator were configured to perform five and two iterations per batch, respectively. Observing the convergence of loss values across training iterations, it is evident that the adversarial strategy introduced in this work has effectively demonstrated its utility. Despite minor fluctuations observed during the training process, the system promptly regains stability, maintaining an overall robust stability.

Figure 5 displays the data generated via the DCT transformation, along with their associated frequency components. In contrast to the result with zero-padding, it can be observed that the frequency components of the generated data have been improved. These generated data effectively assist the prediction model, enhancing its ability to capture fundamental frequency characteristics.

To more intuitively illustrate the quality and reliability of the data generation process, a t-SNE (t-Distributed Stochastic Neighbor Embedding) visualization method is adopted. t-SNE conducts a nonlinear dimensionality reduction by first computing the pairwise similarities between data points in high-dimensional space and then finding a lower-dimensional representation that preserves these similarities. It preserves the relative distances between data points, so that similar points in high-dimensional space remain close to each other in the lower-dimensional representation. This technique is very useful for visualizing high-dimensional data in lower-dimensional spaces (typically 2D or 3D) to better understand the structure and distribution of data.

In our case, t-SNE is used to compare the real and generated data and interpret how well the generator has learned the underlying distribution of the real data. If the real and generated data points are well separated, it may indicate that the model is generating data that is not representative of the real data distribution. However, if the generated data overlaps significantly with the real data, it suggests that the generator has learned to mimic the real data distribution effectively, generating high-quality synthetic data.

t-SNE provides a qualitative insight into the relationship between real and generated data, as depicted in Figure 6. It can be seen that there is a substantial overlap between the original and generated data within the two-dimensional plane. This indicates that the generated data have effectively encapsulated the structural and distributive characteristics of the original dataset. This alignment not only confirms the effectiveness of the generation model, but also highlights the promising potential of the proposed method for data synthesis tasks.

3.3. Performance Test

To quantitatively assess the predictive accuracy, the experiment employed three principal evaluation metrics: Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and the Coefficient of Determination (R²). Given that the actual data and predicted data are denoted by y and

\hat{y}

, respectively, the mathematical formulations for these metrics are outlined as follows:

M A E = \frac{1}{n} \sum_{i = 1}^{n} |{\hat{y}}_{i} - y_{i}| .

(12)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {({\hat{y}}_{i} - y_{i})}^{2}} .

(13)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {({\hat{y}}_{i} - y_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - {\bar{y}}_{i})}^{2}} .

(14)

The MAE is an intuitive metric because it provides a simple average of the error magnitudes across all predictions. A lower MAE value indicates a better result. RMSE penalizes larger errors more, but its scale is easier to understand, due to the use of the same units as the target variable. R² measures the strength of a linear relationship between two variables.

In this experiment, six models widely used in the field of time-series forecasting were selected to evaluate the impact of the proposed method on photovoltaic output prediction. The chosen forecasting models include Long Short-Term Memory (LSTM) [22], Bidirectional LSTM (BiLSTM) [23], Stacked LSTM (SLSTM) [24], a hybrid model of CNN and LSTM (CNN_LSTM) [25], Gated Recurrent Units (GRUs) [26], and Bidirectional GRU (BiGRU) [23].

To ensure fairness in the comparative analysis, the configuration of all algorithms in the experiment was standard, employing a batch size of 64, and each model was subjected to 50 training epochs. Consistently across all models, the Adam optimizer was chosen for the training process, and the learning rate was fixed at 0.001; 80% of the data is allocated for training purposes, while the remaining 20% is reserved for testing.

The photovoltaic power-generation dataset we used has a sampling interval of 5 min and spans 1 year. The number of missing records varies across datasets. However, the total number of records exceeds 100,000 in the datasets.

Table 1 presents the performance comparison for the missing and imputed data at the DKA Solar Center. In these tables, the suffix letter “F” stands for “imputed data”. From the experimental results, it can be observed that the performance of photovoltaic output prediction after data imputation has improved in terms of MAE, RMSE, and R².

Although the improvement in each evaluation metric is relatively small, this does not indicate poor performance of the data imputation method. The primary reason is the fact that missing data account for less than 3% of the dataset, with the majority of the data being complete and intact. Additionally, testing datasets include varying lengths of missing data, ranging from 1 to 1797 points. The results indicate that the performance remains robust, even for short and long periods of missing data.

We chose the data from the DKA Solar Centre due to their high quality, detailed records, and public availability. These characteristics make the data a valuable resource for research reproducibility and benchmarking. However, including data from various geographical and climatic conditions would enhance the credibility of the results. For this reason, we conducted additional experiments on our own dataset to evaluate the effect of geographical and climatic conditions. The experimental results, as tabulated in Table 2, indicate that our proposed method improves prediction accuracy after data imputation, even when applied to different datasets. However, due to company policies, we are unable to make the dataset from West China publicly available.

3.4. Practicality Analysis

Our model can be trained on standard hardware, with a cost not exceeding USD 1500. However, using larger datasets or deploying the model on smaller systems with limited computational power may result in longer training times. To mitigate this, one can reduce the computational load by using fewer LSTM layers or fewer units per layer, though this may come at the cost of slight reductions in model performance.

In terms of real-time processing, while the training process for the LSTM-WGAN model is relatively time-consuming, the generator’s inference speed is quite fast. Our tests show that the average time for the generator to complete a single data-generation process is approximately 0.0026 s. Considering that the current minimum sampling interval for PV data is 1 min, our method is well suited to meet the timeliness requirements for real-time processing.

Our model can handle time-series data, including abrupt changes or anomalies, by leveraging historical data and sequential patterns. Based on these results, it is capable of adapting to seasonal variations, since our experimental data span a full year. If the historical data include indications of sudden changes in weather conditions or unexpected shifts, our method is likely to accurately capture the change trends and correctly impute the data. However, if the historical data do not show any signs of such changes, the model may not be able to respond effectively.

In addition, applying the model to data from different geographical locations presents a challenge. The main reason is that the relative position of the Sun to the Earth varies, resulting in differences in sunlight hours, and it is difficult to infer geographic information solely from short-term historical data. Therefore, our method may require additional adaptations to effectively handle data from diverse geographical areas.

Note that our method can produce sequences of arbitrary length by repeating the data generation process; its performance may degrade when the missing data span prolonged periods. This is because our method relies heavily on historical data and sequential patterns to estimate missing values. Extensive gaps in the data can result in the loss of critical temporal features or trends, which the model depends on to make accurate predictions.

4. Conclusions

This work tackles the challenge of missing data in photovoltaic power-generation records, and proposes a photovoltaic power data imputation method based on WGAN and LSTM. The method robustly maintains model training stability and the integrity of the generated data by utilizing a designed data-driven GAN network, along with an implemented gradient penalty mechanism. Through frequency domain analysis of the generated data and t-SNE metric evaluation, the effectiveness of the proposed method in generating high-quality photovoltaic data is verified. The testing datasets include varying lengths of missing data, ranging from 1 to 1797 points. The results indicate that the performance remains robust, even for short and long periods of missing data.

Our model supports real-time processing, with an average generator inference time of 0.0026 s. It can handle time-series data with seasonal variations. While the model performs well when historical data reflect abrupt changes, it struggles if such patterns are absent. Furthermore, the trained model fails to generate the data at different geographical locations due to variations in sunlight hours and the difficulty of inferring geographic information from short-term data, requiring further adjustments for diverse regions. As a result, expanding our approach to distributed systems with varying geographic and climatic conditions is a key focus of our future research.

Author Contributions

Conceptualization, Z.L., X.X., L.X. and D.Z.; methodology, D.Z.; software, D.Z.; validation, L.X.; formal analysis, L.X.; investigation L.X.; resources, Z.L. and L.X.; data curation, Z.L.; writing—original draft preparation, Z.L.; writing—review and editing, X.X. and D.Z.; visualization, D.G.; supervision, Z.L. and L.X.; project administration, Z.L.; funding acquisition, Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Southern Power Grid Network-level Science and Technology. Project grant number GDKJXM20222474.

Data Availability Statement

The original contributions presented in the study are included in the article; further inquiries can be directed to the corresponding author.

Conflicts of Interest

Author Zhu Liu was employed by China Southern Power Grid Research Technology Co., Ltd.; the authors Lingfeng Xuan, Dehuang Gong, and Xinlin Xie were employed by the company Qingyuan Yingde Power Supply Bureau, Guangdong Electric Power Co., Ltd. The remaining author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Demirhan, H.; Renwick, Z. Missing value imputation for short to mid-term horizontal solar irradiance data. Appl. Energy 2018, 225, 998–1012. [Google Scholar] [CrossRef]
Zang, H.; Chen, D.; Liu, J.; Cheng, L.; Sun, G.; Wei, Z. Improving ultra-short-term photovoltaic power forecasting using a novel sky-image-based framework considering spatial-temporal feature interaction. Energy 2024, 293, 130538. [Google Scholar] [CrossRef]
Yue, H.; Ali, M.M.; Lin, Y.; Liu, H. Ultra-short-term forecasting of large distributed solar PV fleets using sparse smart inverter data. IEEE Trans. Sustain. Energy 2024, 15, 1968–1980. [Google Scholar] [CrossRef]
Xu, Y.; Zheng, S.; Zhu, Q.; Wong, K.C.; Wang, X.; Lin, Q. A complementary fused method using GRU and XGBoost models for long-term solar energy hourly forecasting. Expert Syst. Appl. 2024, 254, 124286. [Google Scholar] [CrossRef]
Liao, R.; Liu, Y.; Xu, X.; Li, Z.; Chen, Y.; Shen, X.; Liu, J. Enhanced photovoltaic power generation forecasting for newly-built plants via Physics-Infused transfer learning with domain adversarial neural networks. Energy Convers. Manag. 2024, 322, 119114. [Google Scholar] [CrossRef]
Liu, W.; Mao, Z. Short-term photovoltaic power forecasting with feature extraction and attention mechanisms. Renew. Energy 2024, 226, 120437. [Google Scholar] [CrossRef]
Peng, T.; Song, S.; Suo, L.; Wang, Y.; Nazir, M.S.; Zhang, C. Research and application of a novel graph convolutional RVFL and evolutionary equilibrium optimizer algorithm considering spatial factors in ultra-short-term solar power prediction. Energy 2024, 308, 132928. [Google Scholar] [CrossRef]
Zhou, H.; Zheng, P.; Dong, J.; Liu, J.; Nakanishi, Y. Interpretable feature selection and deep learning for short-term probabilistic PV power forecasting in buildings using local monitoring data. Appl. Energy 2024, 376, 124271. [Google Scholar] [CrossRef]
Hoyos-Gómez, L.S.; Ruiz-Muñoz, J.F.; Ruiz-Mendoza, B.J. Short-term forecasting of global solar irradiance in tropical environments with incomplete data. Appl. Energy 2022, 307, 118192. [Google Scholar] [CrossRef]
Brooks, M.J.; von Backström, T.W.; van Dyk, E.E. Performance characteristics of a perforated shadow band in the presence of cloud. Sol. Energy 2016, 139, 533–546. [Google Scholar] [CrossRef]
Layanun, V.; Suksamosorn, S.; Songsiri, J. Missing-data imputation for solar irradiance forecasting in Thailand. In Proceedings of the 56th Annual Conference of the Society of Instrument and Control Engineers (SICE), Kanazawa, Japan, 19–22 September 2017. [Google Scholar]
Lei, Z.; Wang, B.; Wang, K.; Pei, Y.; Huang, Z. Photovoltaic power missing data filling based on multiple matching and long- and short-term memory network. Int. Trans. Electr. Energy Syst. 2021, 31, e12829. [Google Scholar] [CrossRef]
Hwang, J.; Suh, D. CC-GAIN: Clustering and classification-based generative adversarial imputation network for missing electricity consumption data imputation. Expert Syst. Appl. 2024, 255, 124507. [Google Scholar] [CrossRef]
Martínez-Comesaña, M.; Eguia-Oller, P.; Martinez-Torres, J.; Febrero-Garrido, L.; Granada-Álvarez, E. Optimisation of thermal comfort and indoor air quality estimations applied to in-use buildings combining NSGA-III and XGBoost. Sustain. Cities Soc. 2022, 80, 103723. [Google Scholar] [CrossRef]
Vinith, P.J.; Vidya, T.; Kathiresan, A.C. Development and performance analysis of aquila algorithm optimized SPV power imputation and forecasting models. IEEE Trans. Sustain. Energy 2024, 15, 2103–2114. [Google Scholar]
Silva-Ramírez, E.L.; Pino-Mejías, R.; López-Coello, M.; Cubiles-de-la-Vega, M.D. Missing value imputation on missing completely at random data using multilayer perceptions. Neural Netw. 2011, 24, 121–129. [Google Scholar] [CrossRef] [PubMed]
Liu, W.; Ren, C.; Xu, Y. PV generation forecasting with missing input data: A super-resolution perception approach. IEEE Trans. Sustain. Energy 2020, 12, 1493–1496. [Google Scholar] [CrossRef]
Ma, J.; Cheng, J.C.; Jiang, F.; Chen, W.; Wang, M.; Zhai, C. A bi-directional missing data imputation scheme based on LSTM and transfer learning for building energy data. Energy Build. 2020, 216, 109941. [Google Scholar] [CrossRef]
Zhang, W.; Luo, Y.; Zhang, Y.; Srinivasan, D. SolarGAN: Multivariate solar data imputation using generative adversarial network. IEEE Trans. Sustain. Energy 2020, 12, 743–746. [Google Scholar] [CrossRef]
Kang, M.; Zhu, R.; Chen, D.; Liu, X.; Yu, W. CM-GAN: A cross-modal generative adversarial network for imputing completely missing data in digital industry. IEEE Trans. Neural Netw. Learn. Syst. 2023, 35, 2917–2926. [Google Scholar] [CrossRef] [PubMed]
Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A. Improved training of Wasserstein GANs. Adv. Neural Inf. Process. Syst. 2017, 30, 1–11. [Google Scholar]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Schuster, M.; Paliwal, K.K. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 1997, 45, 2673–2681. [Google Scholar] [CrossRef]
Malhotra, P.; Vig, L.; Shroff, G.; Agarwal, P. Long short term memory networks for anomaly detection in time series. In Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Bruges, Belgium, 5–7 October 2015. [Google Scholar]
Venugopalan, S.; Xu, H.; Donahue, J.; Rohrbach, M.; Mooney, R.; Saenko, K. Translating videos to natural language using deep recurrent neural networks. In Proceedings of the 2015 Annual Conference of the North American Chapter of the ACL, Denver, CO, USA, 31 May–5 June 2015. [Google Scholar]
Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, 25–29 October 2014; Association for Computational Linguistics: Doha, Qatar, 2014. [Google Scholar]

Figure 1. LSTM-WGAN architecture for repairing the missing PV output data.

Figure 2. Intermittent power-generation records of photovoltaic power plants.

Figure 3. DCT Results of PV power generation records: (a) PV power output of power plant located in Alice Springs, Australia; (b,c) top five lowest-frequency components of the DCT following the zero-padding and 200-value filling methods for missing data, respectively.

Figure 4. Variation in generator and discriminator loss during training epochs.

Figure 5. DCT Results of power generation records: (a) power generation of power plant located in Alice Springs, Australia, after data imputation; (b,c) top five lowest-frequency components of the DCT following the zero-padding and our filling methods for missing data, respectively.

Figure 6. t-NSE visualization of real and generated data.

Table 1. Performance comparison between missing and filled data from the DKA Solar Center.

	MAE	RMSE	R²
LSTM	0.185927	0.388224	0.982945
LSTM_F	0.185037 $(↑$ 0.48)	0.388794 $(↓$ 0.15)	0.984858 $(↑$ 0.19)
BiLSTM	0.190802	0.375741	0.985402
BiLSTM_F	0.189404 $(↑$ 0.73)	0.376090 $(↓$ 0.09)	0.986056 $(↑$ 0.07)
SLSTM	0.178766	0.411446	0.982943
SLSTM_F	0.177863 $(↑$ 0.51)	0.411357 $(↑$ 0.02)	0.984024 $(↑$ 0.11)
CNN_LSTM	0.175632	0.402094	0.982234
CNN_LSTM_F	0.173935 $(↑$ 0.98)	0.401067 $(↑$ 0.26)	0.985392 $(↑$ 0.32)
GRU	0.183840	0.398350	0.984284
GRU_F	0.183221 $(↑$ 0.34)	0.397244 $(↑$ 0.28)	0.985042 $(↑$ 0.08)
BiGRU	0.190808	0.411937	0.983182
BiGRU_F	0.190655 $(↑$ 0.08)	0.411825 $(↑$ 0.03)	0.981753 $(↓$ 0.15)

Note: Bold text in the table indicates better performance, while upward arrows (↑) represent an increase, and downward arrows (↓) represent a decrease.

Table 2. Performance comparison between missing and filled data from China.

	MAE	RMSE	R²
LSTM	0.187207	0.387425	0.984356
LSTM_F	0.186366 $(↑$ 0.45)	0.386594 $(↑$ 0.22)	0.985745 $(↑$ 0.14)
BiLSTM	0.190492	0.376350	0.983947
BiLSTM_F	0.190794 $(↓$ 0.16)	0.374217 $(↑$ 0.57)	0.985385 $(↑$ 0.15)
SLSTM	0.179276	0.411618	0.982462
SLSTM_F	0.178891 $(↑$ 0.22)	0.410818 $(↑$ 0.19)	0.983756 $(↑$ 0.13)
CNN_LSTM	0.175212	0.401431	0.982765
CNN_LSTM_F	0.175324 $(↓$ 0.06)	0.400229 $(↑$ 0.30)	0.984083 $(↑$ 0.13)
GRU	0.185223	0.398829	0.982195
GRU_F	0.183970 $(↑$ 0.68)	0.395954 $(↑$ 0.73)	0.983436 $(↑$ 0.13)
BiGRU	0.191658	0.412229	0.981904
BiGRU_F	0.190197 $(↑$ 0.77)	0.410706 $(↑$ 0.37)	0.983174 $(↑$ 0.13)

Note: Bold text in the table indicates better performance, while upward arrows (↑) represent an increase, and downward arrows (↓) represent a decrease.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, Z.; Xuan, L.; Gong, D.; Xie, X.; Zhou, D. A Long Short-Term Memory–Wasserstein Generative Adversarial Network-Based Data Imputation Method for Photovoltaic Power Output Prediction. Energies 2025, 18, 399. https://doi.org/10.3390/en18020399

AMA Style

Liu Z, Xuan L, Gong D, Xie X, Zhou D. A Long Short-Term Memory–Wasserstein Generative Adversarial Network-Based Data Imputation Method for Photovoltaic Power Output Prediction. Energies. 2025; 18(2):399. https://doi.org/10.3390/en18020399

Chicago/Turabian Style

Liu, Zhu, Lingfeng Xuan, Dehuang Gong, Xinlin Xie, and Dongguo Zhou. 2025. "A Long Short-Term Memory–Wasserstein Generative Adversarial Network-Based Data Imputation Method for Photovoltaic Power Output Prediction" Energies 18, no. 2: 399. https://doi.org/10.3390/en18020399

APA Style

Liu, Z., Xuan, L., Gong, D., Xie, X., & Zhou, D. (2025). A Long Short-Term Memory–Wasserstein Generative Adversarial Network-Based Data Imputation Method for Photovoltaic Power Output Prediction. Energies, 18(2), 399. https://doi.org/10.3390/en18020399

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Long Short-Term Memory–Wasserstein Generative Adversarial Network-Based Data Imputation Method for Photovoltaic Power Output Prediction

Abstract

1. Introduction

2. LSTM-WGAN Method

2.1. Generator

2.2. Discriminator

2.3. Model Training

2.3.1. Model Training

2.3.2. Hyperparameter Determination

2.3.3. Overfitting Problem

3. Experiments and Results

3.1. Data Analysis

3.2. Validation Analysis

3.3. Performance Test

3.4. Practicality Analysis

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI