Short-Term Power Load Forecasting Method Based on Feature Selection and Co-Optimization of Hyperparameters
Abstract
:1. Introduction
- A dynamic adjustment strategy based on the rate of the change of the historical optimal value is proposed to realize that the Improved PSA (IPSA) algorithm can adaptively adjust the PID parameters in real time during the search process. In comparison to conventional meta-heuristic optimization algorithms, this algorithm can strike a balance between search speed and stability and has a higher success rate of searching for optimal values and a faster convergence speed.
- The IPSA algorithm is successively applied to optimize the parameters of variational mode decomposition and to conduct a two-stage synergistic optimization strategy, focusing on the selection of the original load feature set and hyperparameter settings to improve prediction accuracy. This approach efficiently achieves an adaptive variational mode decomposition of the original load sequence and determines the optimal combination of input feature sets and hyperparameters for each load component.
- A multi-step rolling prediction strategy is implemented for each component, and the prediction results of each component are superimposed and reconstructed to obtain the final result. The prediction results based on the actual electricity load dataset in Australia show that the method in this paper can realize the efficient tuning of the parameters in each stage of the hybrid prediction model, and the accuracy is significantly improved compared with the commonly used methods, which is more adaptable to the load prediction of different day types and seasons.
2. Materials and Methods
2.1. Improved Algorithm for PSA
2.1.1. PSA Algorithm
- Population Initialization
- 2.
- Calculation of Population Bias
- 3.
- Population Individual Position Updates
2.1.2. Improved PSA Algorithm Based on Dynamic Regulation Strategy
2.2. Adaptive Variational Mode Decomposition
2.2.1. Variational Mode Decomposition
2.2.2. Adaptive Variational Modal Decomposition Based on IPSA
- (1)
- Configure the parameters of the IPSA algorithm and initialize the population, taking the energy entropy as the objective function.
- (2)
- Decompose the signal using VMD and evaluate the objective function values for each individual using Equation (16).
- (3)
- Compare the objective function values of the individuals, compute the population deviation, update the minimum objective function value, and update the positions of the population individuals according to the IPSA algorithm.
- (4)
- Continue iterating through steps (2) and (3) until the global objective function value reaches the minimization or the maximum iteration count is reached. Finally, output the optimal parameters [K, α].
2.3. Feature Optimization and Model Hyperparameter Co-Tuning Methods
2.3.1. mRMR Algorithm
2.3.2. BiGRU Model Based on Attention Mechanism
2.3.3. Methods of Co-Tuning Feature Optimization and Hyperparameter Based on IPSA
3. Short-Term Power Load Hybrid Prediction Model
3.1. Structure of the Short-Term Power Load Hybrid Prediction Model
3.2. Evaluation Indicators for the Model
4. Case Studies
4.1. Data Preprocessing and Model Construction
4.1.1. Load Sequence Decomposition and Normalization
4.1.2. Model Selection Comparison
- Type I Models: Single prediction models considering load temporal characteristics, including LSTM, GRU, and BiGRU accounting for bidirectional temporal features, and BiGRU-Attention neural networks. These models capture the temporal features of load data using a single neural network.
- Type II Models: Hybrid prediction methods integrating load data processing techniques with prediction models. This includes EEMD-BiGRU and VMD-BiGRU. These models first decompose the raw load data, and then combine it with prediction models.
- Type III Models: Hybrid prediction models that incorporate model hyperparameter optimization and attention mechanisms, based on Particle Swarm Optimization (PSO) [10] for EEMD-BiGRU-Attention and Firefly Algorithm (FA) for VMD-mRMR-BiGRU-Attention. PSO and FA are used to optimize model hyperparameters, while the attention mechanism enhances the model’s ability to capture important features. By selecting these three types of models as benchmarks, this study comprehensively demonstrates the performance advantages of the proposed model in load forecasting and validates the effectiveness of the proposed method under different data processing and model optimization techniques.
4.1.3. Input Feature Matrix and Prediction Model Construction
4.2. Prediction Performance Validation and Comparison
4.2.1. Comparison of Predicted Results with Actual Load Curves
4.2.2. Comparison of Prediction Performance with Baseline Models
4.2.3. Adaptability to Different Seasons and Days
5. Conclusions
- (1)
- The dynamic adjustment strategy for PID parameters based on the rate of the change of historical optimal function values proposed in this study improves the convergence speed and adaptability of the Improved Population-based Simulated Annealing (IPSA) algorithm. This lays a foundation for enhancing the optimization performance of the forecasting method.
- (2)
- By integrating the IPSA optimization algorithm into adaptive variational mode decomposition, as well as coordinated optimization of feature selection and model hyperparameters, the method efficiently searches for optimal solutions, achieving optimal values for parameters at each stage of the prediction model. This approach effectively utilizes the interdependence between feature selection and model hyperparameters, enhancing the capture of data characteristics, while reducing the need for manual intervention and its impact.
- (3)
- Compared to the conventional methods, the proposed approach reduces forecasting complexity and demonstrates significantly improved prediction accuracy. It exhibits stronger adaptability across different types of days and seasons in load forecasting, resulting in more stable prediction performance.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A
Appendix B
1. Initialization: |
(1) Initialize the population size N, maximum number of iterations T, and configure the PID parameters , , and . |
(2) Set the initial iteration count t = 1. |
(3) Create the population and initialize the positions of the individuals in the population based on Equation (1). |
2. Iterative optimization (while t ≤ T): |
(1) Construct a vector of objective function values and compute the objective function value for each individual. |
(2) Select the best individual and its corresponding best function value from the population: x*(t), . |
(3) Calculate the rate of change factor for the historical best function value to dynamically adjust the PID parameters: . |
(4) Calculate the deviation of the population individual based on Equation (3). |
(5) Calculate and : When t = 1, set When t > 1, set ; calculate based on Equation (4). |
(6) Calculate based on Equation (10). |
(7) Calculate based on Equation (6). |
3. Update population individual positions: |
(1) Update the positions of the individuals in the population according to Equation (8). |
(2) t = t + 1 |
4. If the iteration count t > T, end the optimization process. |
Model | Parameters | |
---|---|---|
GRU | hidden layer length: [x1, x2] Learning rate: lr/10−3 Regularization coefficient: λ/10−5 Dropout rate: dout | [128, 64] 4 1 0.1 |
BiGRU | hidden layer length: [x1, x2] Learning rate: lr/10−3 Regularization coefficient: λ/10−5 Dropout rate: dout | [128, 64] 4 2 0.2 |
BiGRU-Attention | hidden layer length: [x1, x2] Learning rate: lr/10−3 Regularization coefficient: λ/10−5 Dropout rate: dout | [128, 64] 5 5 0.2 |
EEMD-BiGRU | hidden layer length: [x1, x2] Learning rate: lr/10−3 Regularization coefficient: λ/10−5 Dropout rate: dout | [128, 64] 4 2 0.2 |
VMD-BiGRU | VMD decomposition parameters: [K, α] hidden layer length: [x1, x2] Learning rate: lr/10−3 Regularization coefficient: λ/10−5 Dropout rate: dout | [5, 2240] [128, 64] 4 2 0.2 |
FA-EEMD- BiGRU-Attention | hidden layer length: [x1, x2] Learning rate: lr/10−3 Regularization coefficient: λ/10−5 Dropout rate: dout | [119, 100] 1.201 2.132 0.2 |
PSO-VMD-mRMR- BiGRU-Attention | VMD decomposition parameters: [K, α] hidden layer length: [x1, x2] Learning rate: lr/10−3 Regularization coefficient: λ/10−5 Dropout rate: dout | [5, 2240] [126, 102] 2.117 6.128 0.2 |
Common Parameters | The number of implicit layers of the deep learning model is 2; The batch size is 256; The number of training rounds is 50; The prediction step size is taken as 48; The loss function is MSE; m is taken as 5 in the cross-validation strategy; The input features are all original feature sets; Parameter settings for the hyperparameter optimization section: T = 30, N = 50; The optimization algorithms objective function are all MMSE; The introduction of the cross-validation strategy is used in all of them; |
References
- Zhu, J.; Dong, H.; Li, S.; Chen, Z.; Luo, T. Review of data-driven load forecasting for integrated energy system. Proc. CSEE 2021, 41, 7905–7924. [Google Scholar] [CrossRef]
- Han, X.; Li, T.; Zhang, D.; Zhou, X. New issues and key technologies of new power system planning under double carbon goals. High Volt. Eng. 2021, 47, 3036–3046. [Google Scholar] [CrossRef]
- Yang, Z.; Liu, J.; Liu, Y.; Wen, L.; Wang, Z.; Ning, S. Transformer load forecasting based on adaptive deep belief network. Proc. CSEE 2019, 39, 4049–4061. [Google Scholar] [CrossRef]
- Pan, F.; Cheng, H.; Yang, J.; Zhang, C.; Pan, Z. Power system short-term load forecasting based on support vector machines. Power Syst. Technol. 2004, 28, 39–42. [Google Scholar] [CrossRef]
- Li, L.; Wei, J.; Li, C.; Cao, Y.; Fang, B. Prediction of load model based on artificial neural network. Trans. China Electrotech. Soc. 2015, 30, 225–230. [Google Scholar] [CrossRef]
- Chen, W.; Hu, Z.; Yue, Q.; Du, Y.; Qi, Q. Short-term Load Prediction Based on Combined Model of Long Short-term Memory Network and Light Gradient Boosting Machine. Autom. Electr. Power Syst. 2021, 45, 91–97. [Google Scholar] [CrossRef]
- Wang, Z.; Zhao, B.; Ji, W.; Gao, X.; Li, X. Short-term load forecasting method based on GRU-NN model. Autom. Electr. Power Syst. 2019, 43, 53–58. [Google Scholar]
- Stuke, A.; Rinke, P.; Todorović, M. Efficient hyperparameter tuning for kernel ridge regression with Bayesian optimization. Mach. Learn. Sci. Technol. 2021, 2, 035022. [Google Scholar] [CrossRef]
- Liu, J.; Yang, Y.; Lv, S.; Wang, J.; Chen, H. Attention-based BiGRU-CNN for Chinese question classification. J. Ambient Intell. Humaniz. Comput. 2019, 1–12. [Google Scholar] [CrossRef]
- Kong, X.; Li, C.; Zheng, F.; Yu, L.; Ma, X. Short-term load forecasting method based on empirical mode decomposition and feature correlation analysis. Autom. Electr. Power Syst. 2019, 43, 46–52. [Google Scholar]
- Deng, D.; Li, J.; Zhang, Z.; Teng, Y.; Huang, Q. Short-term electric load forecasting based on EEMD-GRU-MLR. Power Syst. Technol. 2020, 44, 593–602. [Google Scholar] [CrossRef]
- Lv, L.; Wu, Z.; Zhang, J.; Zhang, L.; Tan, Z.; Tian, Z. A VMD and LSTM based hybrid model of load forecasting for power grid security. IEEE Trans. Ind. Inform. 2021, 18, 6474–6482. [Google Scholar] [CrossRef]
- Liang, Z.; Sun, G.Q.; Li, H.C.; Wei, Z.N.; Zang, H.Y.; Zhou, Y.Z.; Chen, S. Short-term load forecasting based on VMD and PSO optimized deep belief network. Power Syst. Technol. 2018, 42, 598–606. [Google Scholar] [CrossRef]
- Hu, W.; Zhang, X.; Li, Z.; Li, Q.; Wang, H. Short-term load forecasting based on an optimized VMD-mRMR-LSTM model. Power Syst. Prot. Control 2022, 50, 88–97. [Google Scholar] [CrossRef]
- Li, J.; Wang, P.; Li, W. A hybrid multi-strategy improved sparrow search algorithm. Comput. Eng. Sci. 2024, 46, 303–315. [Google Scholar] [CrossRef]
- Yi, Y.; Lou, S. Short-term Power Load Forecasting Based on Sequence Component Recombination and Temporal Self-attention Mechanism Improved TCN-BiLSTM. Proc. CSU-EPSA 2024, 1–11. [Google Scholar] [CrossRef]
- Yan, X.; Qin, C.; Ju, P.; Cao, L.; Li, J. Optimal feature selection of load power models. Electr. Power Eng. Technol. 2021, 40, 84–91. [Google Scholar] [CrossRef]
- Lu, J.; Liu, J.; Luo, Y.; Zeng, J. Small sample load forecasting method considering characteristic distribution similarity based on improved WGAN. Control Theory Appl. 2024, 41, 597–608. [Google Scholar]
- Peng, H.; Long, F.; Ding, C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1226–1238. [Google Scholar] [CrossRef]
- Liang, Y.; Niu, D.; Hong, W.-C. Short term load forecasting based on feature extraction and improved general regression neural network model. Energy 2019, 166, 653–663. [Google Scholar] [CrossRef]
- Dai, Y.; Yang, X.; Leng, M. Forecasting power load: A hybrid forecasting method with intelligent data processing and optimized artificial intelligence. Technol. Forecast. Soc. Chang. 2022, 182, 121858. [Google Scholar] [CrossRef]
- Jiao, L.; Zhou, K.; Zhang, Z.; Han, F.; Luo, L.; Luo, Z. Dual-stage feature selection for short-term load forecasting based on mRMR-IPSO. J. Chongqing Univ. 2024, 47, 98–109. [Google Scholar] [CrossRef]
- Yue, X.; Zhu, R.; Gong, X. Short-Term Multidimensional Time Series Photovoltaic Power Prediction using a Multi-Strategy Optimized Long Short-Term Memory Neural Network. Proc. CSU-EPSA 2024, 1, 1–12. [Google Scholar] [CrossRef]
- Gao, Y. PID-based search algorithm: A novel metaheuristic algorithm based on PID algorithm. Expert Syst. Appl. 2023, 232, 120886. [Google Scholar] [CrossRef]
- Gai, J.; Shen, J.; Hu, Y.; Wang, H. An integrated method based on hybrid grey wolf optimizer improved variational mode decomposition and deep neural network for fault diagnosis of rolling bearing. Measurement 2020, 162, 107901. [Google Scholar] [CrossRef]
- Li, D.; Sun, G.; Miao, S.; Zhang, K.; Tan, Y.; Zhang, Y.; He, S. A Short-term Power Load Forecasting Method Based on Multidimensional Temporal Information Fusion. Proc. CSEE 2023, 43, 94–106. [Google Scholar] [CrossRef]
- Zhao, J.; Jie, Z.; Liu, S. A tide prediction accuracy improvement method research based on VMD optimal decomposition of energy entropy and GRU recurrent neural network. Chin. J. Sci. Instrum. 2023, 44, 79–87. [Google Scholar] [CrossRef]
Dataset | Actual Electricity Load Data from New South Wales, Australia |
---|---|
Sample Time | 1 March 2007 to 29 February 2008 |
Dataset Division | The last week of each season is the test set, and the remaining weeks are the training set. The validation set is 20% of the training set |
Composition | Primarily industrial load |
Time Interval | 30 min |
Historical Duration () | 336 (7 days) |
Prediction Duration () | 48 (1 day) |
Original Input Features | Calendar features: |
Meteorological features: | |
Historical price features: | |
Historical load features: | |
Output |
Parameters | Value |
---|---|
Population Size | 50 |
Iterations | 20 |
Objective Function | MMSE |
[Kp0, Ki0, Kd0] | [1, 0.5, 1.2] |
Number of Features (N0) | [1, 20] |
Length of Each Hidden Layer (xi) | [100, 300] |
Learning Rate (lr) | [0.1, 0.2, …, 0.5] |
Regularization Coefficient () | [10−5, 10−2] |
Dropout Rate (dout) | [10−5, 10−2] |
imf | {x1, x2} | /10−4 | /10−3 | |
---|---|---|---|---|
imf1 | {127, 300} | 0.01 | 1.142 | 1.155 |
imf2 | {287, 277} | 0.05 | 1.725 | 3.697 |
imf3 | {100, 124} | 0.02 | 9.141 | 1.503 |
imf4 | {131, 100} | 0.04 | 2.154 | 1.983 |
imf5 | {112, 138} | 0.05 | 5.127 | 3.435 |
imf | N0 | Optimal Input Feature Set |
---|---|---|
imf1 | 16 | Lt−7, Lt−1, Lt−2, L10.5, T0, W0, Pt−1, Tt−1, Wt−1, Pt−7, Tt−7, Hday, Lt−6, Sseason, Sweek, L22.5 |
imf2 | 10 | Lt−1, Lt−7, Shour, Lt−2, T0, Wt−1, H, Sweek, Pt−1, Smonth |
imf3 | 12 | Lt−1, Lt−7, Lt−5, T0, Lt−2, Shour, Tt−7, W0, Sday, Tt−7, Sweek, Smonth, L17.5, Wt−1 |
imf4 | 5 | Lt−1, Lt−7, Lt−2, Lt−5, Shour |
imf5 | 2 | Lt−1, Lt−7 |
Model | RMSE/MW | MAE/MW | MAPE/% | EV | ||||
---|---|---|---|---|---|---|---|---|
Weekdays | Weekends | Weekdays | Weekends | Weekdays | Weekends | Weekdays | Weekends | |
LSTM | 433.24 | 384.2 | 325.74 | 297.83 | 4.05 | 3.74 | 0.9 | 0.93 |
GRU | 440.91 | 391 | 331.51 | 303.1 | 4.18 | 3.86 | 0.89 | 0.95 |
BiGRU | 379.31 | 336.37 | 285.2 | 260.75 | 3.15 | 2.9 | 0.91 | 0.94 |
BiGRU-Attention | 354.97 | 314.79 | 266.89 | 244.02 | 3.06 | 2.83 | 0.91 | 0.96 |
Means of Type I Models | 391.73 | 347.39 | 302.34 | 276.43 | 3.46 | 3.20 | 0.90 | 0.95 |
EEMD-BiGRU | 287.43 | 254.89 | 216.11 | 197.59 | 2.69 | 2.48 | 0.96 | 0.94 |
VMD-BiGRU | 343.39 | 304.51 | 258.19 | 236.05 | 2.48 | 2.29 | 0.95 | 0.95 |
Means of Type II Models | 315.41 | 279.70 | 237.15 | 216.82 | 2.59 | 2.39 | 0.96 | 0.95 |
FA-EEMD-BiGRU-Attention | 237.02 | 210.19 | 178.21 | 162.94 | 2.24 | 2.07 | 0.97 | 0.98 |
PSO-VMD-mRMR-BiGRU-Attention | 247.36 | 219.36 | 185.98 | 170.05 | 2.34 | 2.16 | 0.96 | 0.98 |
Means of Type III Models | 242.19 | 214.78 | 182.1 | 166.49 | 2.29 | 2.12 | 0.97 | 0.98 |
Means of All Comparison Models | 340.45 | 301.91 | 255.98 | 234.04 | 3.02 | 2.79 | 0.93 | 0.95 |
The Proposed Model | 202.68 | 179.74 | 152.39 | 139.33 | 2.05 | 1.93 | 0.97 | 0.98 |
Model | RMSE/MW | MAE/MW | MAPE/% | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Spring | Summer | Autumn | Winter | Spring | Summer | Autumn | Winter | Spring | Summer | Autumn | Winter | |
LSTM | 404.76 | 425 | 401.16 | 395.96 | 306.64 | 319.55 | 306.23 | 306.95 | 3.86 | 4.05 | 3.9 | 3.78 |
GRU | 410.12 | 429.00 | 405.18 | 410.12 | 310.70 | 322.56 | 309.30 | 317.92 | 3.91 | 4.09 | 3.95 | 4.12 |
BiGRU | 352.49 | 370.11 | 360.28 | 352.49 | 267.04 | 278.28 | 275.02 | 273.25 | 2.98 | 3.13 | 3.01 | 2.98 |
BiGRU-Attention | 335.69 | 336.03 | 335.69 | 332.12 | 254.31 | 252.65 | 256.25 | 257.46 | 2.95 | 2.95 | 2.95 | 2.92 |
Means of Type I Models | 375.77 | 390.04 | 375.58 | 372.67 | 284.67 | 293.26 | 286.70 | 288.89 | 3.43 | 3.56 | 3.45 | 3.45 |
EEMD-BiGRU | 268.44 | 276.50 | 268.44 | 271.27 | 203.36 | 207.89 | 204.92 | 210.29 | 2.56 | 2.64 | 2.56 | 2.59 |
VMD-BiGRU | 324.62 | 328.70 | 321.24 | 321.24 | 245.92 | 247.14 | 245.22 | 249.02 | 2.39 | 2.42 | 2.37 | 2.37 |
Means of Type II Models | 296.53 | 302.60 | 294.84 | 296.26 | 224.64 | 227.52 | 225.07 | 229.66 | 2.48 | 2.53 | 2.47 | 2.48 |
FA-EEMD-BiGRU-Attention | 220.85 | 231.89 | 220.85 | 220.85 | 167.31 | 174.35 | 168.59 | 171.20 | 2.13 | 2.24 | 2.13 | 2.13 |
PSO-VMD-mRMR-BiGRU-Attention | 229.14 | 243.66 | 231.50 | 229.14 | 173.59 | 183.20 | 176.72 | 177.63 | 2.21 | 2.35 | 2.23 | 2.21 |
Means of Type III Models | 225.00 | 237.78 | 226.18 | 225.00 | 170.45 | 178.78 | 172.65 | 174.41 | 2.17 | 2.30 | 2.18 | 2.17 |
Means of All Comparison Models | 318.26 | 330.11 | 318.04 | 316.65 | 241.11 | 248.20 | 242.78 | 245.46 | 2.87 | 2.98 | 2.89 | 2.89 |
The Proposed Model | 183.72 | 201.15 | 181.86 | 198.11 | 139.18 | 151.24 | 138.82 | 153.57 | 1.91 | 2.11 | 1.89 | 2.07 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, Z.; Zheng, S.; Li, K. Short-Term Power Load Forecasting Method Based on Feature Selection and Co-Optimization of Hyperparameters. Energies 2024, 17, 3712. https://doi.org/10.3390/en17153712
Liu Z, Zheng S, Li K. Short-Term Power Load Forecasting Method Based on Feature Selection and Co-Optimization of Hyperparameters. Energies. 2024; 17(15):3712. https://doi.org/10.3390/en17153712
Chicago/Turabian StyleLiu, Zifa, Siqi Zheng, and Kunyang Li. 2024. "Short-Term Power Load Forecasting Method Based on Feature Selection and Co-Optimization of Hyperparameters" Energies 17, no. 15: 3712. https://doi.org/10.3390/en17153712
APA StyleLiu, Z., Zheng, S., & Li, K. (2024). Short-Term Power Load Forecasting Method Based on Feature Selection and Co-Optimization of Hyperparameters. Energies, 17(15), 3712. https://doi.org/10.3390/en17153712