Short-Term Load Forecasting Based on Outlier Correction, Decomposition, and Ensemble Reinforcement Learning

Wang, Jiakang; Liu, Hui; Zheng, Guangji; Li, Ye; Yin, Shi

doi:10.3390/en16114401

Open AccessArticle

Short-Term Load Forecasting Based on Outlier Correction, Decomposition, and Ensemble Reinforcement Learning

by

Jiakang Wang

,

Hui Liu

^*,

Guangji Zheng

,

Ye Li

and

Shi Yin

Institute of Artificial Intelligence & Robotics (IAIR), Key Laboratory of Traffic Safety on Track of Ministry of Education, School of Traffic & Transportation Engineering, Central South University, Changsha 410075, China

^*

Author to whom correspondence should be addressed.

Energies 2023, 16(11), 4401; https://doi.org/10.3390/en16114401

Submission received: 8 May 2023 / Revised: 26 May 2023 / Accepted: 28 May 2023 / Published: 30 May 2023

(This article belongs to the Topic Short-Term Load Forecasting)

Download

Browse Figures

Versions Notes

Abstract

:

Short-term load forecasting is critical to ensuring the safe and stable operation of the power system. To this end, this study proposes a load power prediction model that utilizes outlier correction, decomposition, and ensemble reinforcement learning. The novelty of this study is as follows: firstly, the Hampel identifier (HI) is employed to correct outliers in the original data; secondly, the complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) is used to extract the waveform characteristics of the data fully; and, finally, the temporal convolutional network, extreme learning machine, and gate recurrent unit are selected as the basic learners for forecasting load power data. An ensemble reinforcement learning algorithm based on Q-learning was adopted to generate optimal ensemble weights, and the predictive results of the three basic learners are combined. The experimental results of the models for three real load power datasets show that: (a) the utilization of HI improves the model’s forecasting result; (b) CEEMDAN is superior to other decomposition algorithms in forecasting performance; and (c) the proposed ensemble method, based on the Q-learning algorithm, outperforms three single models in accuracy, and achieves smaller prediction errors.

Keywords:

short-term load forecasting; outlier correction; decomposition; ensemble reinforcement learning

1. Introduction

Electric load forecasting is an important aspect of modern power system management and a key research focus of power companies [1]. It comprises long-term, medium-term, and short-term forecasting, depending on the specific goals [2]. Notably, short-term load forecasting plays an important role in power generation planning and enables relevant departments to establish appropriate power dispatching plans [3,4], which is crucial for maintaining the safe and stable operation of the power system and enhancing its social benefits [5]. In addition, it facilitates the growth of the power market and boosts economic benefits [6]. Therefore, devising an effective and precise method for short-term load forecasting is of significant importance.

With the need for accurate energy forecasting in mind, various forecasting methods have been developed. Early studies produced several models for short-term power load forecasting, including the Auto-Regressive (AR), Auto-Regressive Moving Average (ARMA), and Auto-Regression Integrated Moving Average (ARIMA) models. A case in point is the work of Chen et al. [7], who employed the ARMA model for short-term power load forecasting. This method utilizes observed data as the initial input, and its fast algorithm produces predicted load values that are in line with the trend in load variation. However, it falls short in terms of accounting for the factors that affect such variation, thus leaving room for enhancement in prediction accuracy.

In recent years, scholars have turned to machine learning [8] and deep learning [9] to improve electric load forecasting accuracy and uncover complex data patterns. Among traditional machine learning algorithms, Support Vector Machine (SVM) [10] is the most widely used in the field of electric load forecasting. Its advantages include the need for relatively few training samples and interpretable features. Hong [11] and Fan et al. [12] have demonstrated the high accuracy of SVM in short-term electric load forecasting. However, as the smart grid continues to develop, power load data have become increasingly numerous and multifaceted, and SVM is confronted with the challenge of slow computing in such situations. Compared to traditional machine learning methods, deep learning methods exhibit stronger fitting capacity and produce better results. Currently, a diverse set of deep learning approaches have been implemented for load forecasting, including the Gated Recurrent Unit (GRU) [13], Temporal Convolutional Network (TCN) [14], Long-Short-Term Memory (LSTM) [15], as well as other deep learning methods [9,16]. Compared to traditional Recurrent Neural Networks (RNN) and LSTM, GRU presents better forecasting results and faster running speed in short-term load forecasting. Wang et al. [17] used the GRU algorithm to extract and learn the time characteristics of load consumption. Their results showed that the predictive accuracy improved by more than 10% compared to RNN. Cai [18] found the GRU uses fewer parameters in the model and the important features were preserved, resulting in faster running speeds compared to LSTM. Imani [19] utilized Convolutional Neural Network (CNN) to extract the nonlinear relationships of residential loads and achieved remarkably precise outcomes. Song et al. [20] devised a thermal load prediction model by utilizing TCN networks, which facilitated the extraction of complex data features and enabled precise load prediction.

Since single prediction models are insufficient in terms of applicability scenarios and prediction accuracy to achieve optimal results [21], a considerable amount of literature has employed hybrid models for prediction. Hybrid models combine data preprocessing, feature selection, optimization algorithms, decomposition algorithms, and other technologies to fully utilize the benefits of disparate methods and improve load power prediction accuracy. Research has revealed that the decomposition method and the ensemble learning method are particularly advantageous among the hybrid models [22].

According to frequency analysis, the electric load exhibits clear cyclical patterns that result from the underlying superposition of multiple components with varying frequencies [23]. Therefore, decomposing time series has become a widely employed method in the area of electric load forecasting. Sun [24] proposed a short-term load forecasting model utilizing Ensemble Empirical Mode Decomposition (EEMD) and neural networks, considering wind power grid connections, and verified better decomposition effects of EEMD than wavelet decomposition. Liu Hui et al. [25] utilized Variational Modal Decomposition (VMD) to decompose load sequences and developed a hybrid forecasting model for accurate prediction, achieving an accuracy of 99.15%. Irene et al. [26] employed a hybrid prediction model combining Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) to enhance the accuracy of energy consumption prediction.

The ensemble learning method combines multiple sets of data with multiple individual learners, whether independent or identical, which have different distributions to improve predictive performance [27]. Popular ensemble learning algorithms include boosting, bagging, and stacking algorithms. Ensemble learning methods are commonly conducted by stacking-based or weight-based strategies [28]. Rho et al. [29] used a stacking ensemble approach to merge short-term load forecasting models to more accurately predict building electric energy consumption. Massaoudi et al. [30] proposed a stacked XGB-LGBM-MLP model to cope with the stochastic variations in load demand. Bento et al. [31] present an automatic framework using deep learning-based stacking methodology to select the best Box–Jenkins models for 24-h ahead load forecasting from a wide range of combinations.

Although the above load power prediction models achieve a satisfactory forecasting effect, some limitations persist, and there is still some room for improvement. Firstly, the current short-term load forecasting models seldom consider detecting and correcting outliers in the original data. Studies have demonstrated that adopting outlier correction can significantly improve the performance of pollution forecasting [32]. Secondly, the existing combination weights of load power ensemble prediction models lack diversity and should take into account different weight distribution strategies for the prediction results generated by different base learners. The literature shows that weight ensemble based on reinforcement learning can offer advantages in wind speed prediction [33,34].

To address the aforementioned research gaps, this paper presents a short-term load forecasting model (HI-CEEMDAN-Q-TEG) based on outlier correction, decomposition, and ensemble reinforcement learning. The contributions and novelty of this paper are summarized as follows:

This paper employs an outlier detection method to correct outliers in the original load power data. Such outliers may arise due to human error or other situations. Directly inputting the original data into the model without processing could lead to problems. To address this and identify and correct outliers in the data, this paper utilizes the Hampel identifier (HI) algorithm. This step is crucial as it provides the nonlinear information in the data to the forecasting model;
This paper utilizes a decomposition method to extract fully waveform characteristics of the data. Specifically, the CEEMDAN method is utilized in this study to decompose the raw non-stationary load power data. By decomposing the load power data into multiple sub-sequences through CEEMDAN, the waveform characteristics of the data can be extracted thoroughly, ultimately enhancing the performance of the predictor;
This paper introduces an ensemble learning algorithm based on reinforcement learning. It is necessary to consider varying weights when combining preliminary predictions from different base learners. This study employs three single models to predict processed load power data, followed by the utilization of the Q-learning method to obtain cluster weights that are suitable for the ensemble forecast. Compared to other ensemble learning algorithms, the Q-learning method deploys agents to learn in the environment through trial and error, resulting in an innovative and superior method.

2. Methodology

2.1. Framework of the Proposed Model

This study presents a novel forecasting model, namely the HI-CEEMDAN-Q-TEG, for predicting load power. The model framework, as depicted in Figure 1, consists of three distinct steps with specific details as follows:

Step 1: Using HI to detect and correct outliers. The original load power data is characterized by fluctuations, randomness, and nonlinearity; therefore, outliers can arise as a result of either equipment or human factors. By using HI, outliers can be identified and corrected in the training set, which eliminates the likelihood of their interference with model training. This approach serves as a valuable tool for enhancing the precision of load power prediction;

Step 2: Applying CEEMDAN to decompose original data into subseries. Given its prominent cyclical characteristics, the load power data can be perceived, from a frequency domain perspective, as a composite of several components with varying frequencies. The CEEMDAN method can adaptively decompose this data into multiple subseries, thereby reducing the model’s non-stationarity and enhancing the predictor’s modeling efficiency and capacity;

Step 3: Using the Q-learning ensemble method for prediction. The load power data prediction is achieved by employing three base learners: the temporal convolutional network (TCN); gate recurrent unit (GRU); and extreme learning machine (ELM), which are referred to as TEG. After correcting for outliers, the TEG is used to make accurate predictions. Ensemble weights for different single models are determined using the Q-learning method. This algorithm updates the weights repeatedly through trial-and-error learning, thereby optimizing the diversity and appropriateness of the ensemble weights.

2.2. Hampel Identifier

HI is a widely used method for detecting and correcting outliers [35]. Due to its excellent effectiveness, many researchers employ this method. To apply the HI algorithm to input data

A = [a_{1}, a_{2}, \dots, a_{k}]

, set the sliding window length as

w = 2 n + 1

. For each sample

a_{i}

, obtain the median

m_{i}

, as well as median absolute deviation (MAD) from the samples of length

n

around the specific center point. Set the evaluation parameter as

α = 0.6745

, and calculate the standard deviation

σ_{i}

using MAD and

a

[36]. The formulas for calculating

m_{i}

, MAD, and

σ_{i}

are as follows [32]:

m_{i} = median (a_{i - n}, a_{i - n + 1}, \dots, a_{i}, \dots, a_{i + n - 1}, a_{i + n})

(1)

{MAD}_{i} = median (| a_{i - n} - m_{i} |, | a_{i - n + 1} - m_{i} |, \dots, | a_{i + n - 1} - m_{i} |)

(2)

σ_{i} = M A D_{i} / α

(3)

Based on the 3d statistical rule, if the difference between a sample value and the window median exceeds three standard deviations, the window median will replace the sample data [37]:

| a_{i} - m_{i} | > 3 σ_{i}

(4)

The use of HI allows for the outliers to be corrected in the raw data, which, if left untreated, could potentially disrupt the model training process. The incorporation of HI into data preprocessing leads to an enhanced nonlinear fitting performance of the data.

2.3. Complete Ensemble Empirical Mode Decomposition with Adaptive Noise

CEEMDAN is a decomposition algorithm used to analyze time series data for nonlinearity and non-stationarity [38]. By smoothing the overall data and extracting information about multiple frequencies from the original data, CEEMDAN can decompose the data into sub-sequences with varying frequency and time information. The CEEMDAN algorithm is adaptive, meaning it can automatically select the appropriate noise level based on the unique characteristics of a given signal. This adaptability and robustness make the CEEMDAN algorithm ideal for processing nonlinear and non-stationary signals [39].

Based on the EMD algorithm, the CEEMDAN algorithm makes the signal more stable and accurate in the decomposition process by introducing a noise signal. Meanwhile, it adopts multiple decompositions and average methods to improve the accuracy and stability of signal decomposition [40].

The CEEMDAN algorithm has the advantage of solving mutual interference and noise interference problems between intrinsic mode functions (IMFs). This leads to improved accuracy and stability of signal decomposition.

2.4. Base Learners

2.4.1. Temporal Convolutional Network

The TCN algorithm is a commonly used convolutional network in time series predictions [41]. Because of the causal relationship between load data over time, the prediction at time

t

depends on previous times, and the TCN network effectively maintains this temporal order and causality. TCN consists of three parts: causal convolution; expansive convolution; and residual convolution.

In TCN, causal convolution ensures that the output of the upper layers of the network at time

t

only depends on the input of the lower layers before time

t

. Expansion convolution involves setting hyperparameters of the expansion factor to adjust the convolutional interval. To reduce the limitations of downward transmission after nonlinear transformation in the original network structure, TCN adds multiple direct channels to the original network structure, allowing the input information to be directly transmitted to later layers.

2.4.2. Extreme Learning Machine

ELM is an efficient artificial neural network whose principle is based on fully random projections and the least squares method [42]. Fully random projection refers to the projection of input data into a high-dimensional space. This increases the separability of data in the feature space [43]. Through random initialization of the weights of the input and hidden layers, the ELM algorithm can minimize training errors very quickly, facilitating rapid learning and prediction.

ELM can be expressed mathematically as follows [32]:

y_{i} = β g (W x_{i} + b)

(5)

where

β

represents the output weight matrix,

g (x)

represents the activation function,

W

represents the input weight matrix, and

b

represents the vector of bias.

With

H

representing the output matrix and

Y

representing the true value matrix, the matrix expression for Extreme Learning Machine (ELM) is as follows:

H β = Y

(6)

where

H

is a matrix whose rows represent the output of the hidden layer for each input sample, and

β

is a matrix of output weights.

2.4.3. Gate Recurrent Unit

In 2014, Cho proposed the Gated Recurrent Unit (GRU) as an improvement on Long-Short-Term Memory (LSTM) [44]. The GRU has two gates, the reset gate and the update gate, which, respectively, determine whether to add historical information to the current state and the relevance of historical information. Compared to the LSTM, the GRU uses fewer parameters in the model and the important features are preserved, resulting in faster running speeds.

The formulas for the update gate as well as reset gate calculation are as follows:

x_{t} = σ (W_{x} * [h_{t - 1}, x_{t}])

(7)

r_{t} = σ (W_{r} * [h_{t - 1}, x_{t}])

(8)

where

x_{t}

represents the current input value;

h_{t - 1}

represents the state of the previous hidden;

W

represents the matrix of weight.

2.5. Ensemble Reinforcement Learning Method

As a distinct machine learning method, reinforcement learning is different from supervised learning or unsupervised learning due to its continuous interactions with the environment as an agent, which guides subsequent actions by providing feedback on the reward received, aiming to maximize the rewards [45]. The Q-learning method is a reinforcement learning algorithm based on estimated values [46]. Q-learning generates a Q-value table that captures the relationship between each action taken and state. Each value in this table represents the obtained reward for actions taken in each state.

The Q-table approach selects the action with the highest potential reward and uses a penalty and reward mechanism to keep the Q-table in the update until the optimal result is achieved. This happens when a specific condition is met, signifying that the algorithm has found the optimal action for each state. [47]. In this study, we employ the Q-learning method to combine the forecasting outcomes of TCN, ELM, and GRU. As a result, different ensemble weights are generated for each base learner to effectively address the issue of weak robustness associated with a single weight as well as a single model.

3. Case Study

3.1. Data Description

To verify the practicality of the proposed model, three sets of load power data from Pecan Street datasets were utilized in this study [48]. The Pecan Street datasets contain the load power data of 25 households in the Austin area of the United States, recorded at a sampling interval of 15 min in 2018. Figure 2 showcases the load power datasets #1, #2, and #3 collected from the 1st to the 15th of each month in January, April, and September, respectively, in the Austin area. Each dataset comprises 1440 samples, divided into two parts: 1240 training set samples and 200 test set samples. The training sets are utilized to train the single models and the Q-learning ensemble method, while the testing set is utilized to evaluate the performance of all the models discussed in this paper.

Table 1 lists the statistical characteristics of three load power datasets. As observed from Figure 2 and Table 1, these three sets of load power data possess distinct statistical characteristics; however, they all exhibit non-stationarity and volatility.

3.2. Performance Evaluation Indexes

To provide a comprehensive evaluation of the forecasting performance of the models, three statistical indexes are employed in this study: mean absolute error (MAE); root mean square error (RMSE); and mean absolute percentage error (MAPE). The smaller the values of these indexes, the higher the model’s prediction accuracy. The definitions of these indexes are shown as follows:

M A E = (\sum_{t = 1}^{T} | y (t) - \hat{y} (t) |) / T,

(9)

M A P E = (\sum_{t = 1}^{T} | (y (t) - \hat{y} (t)) / y (t) |) / T,

(10)

R M S E = \sqrt{(\sum_{t = 1}^{T} {[y (t) - \hat{y} (t)]}^{2}) / T},

(11)

where

y (t)

is the original load power data at time

t

,

\hat{y} (t)

is the forecasted load power data at time

t

, and

T

is the number of samples in

y (t)

.

3.3. Forecasting Results and Analysis

The experiments aimed to compare the proposed hybrid HI-CEEMDAN-Q-TEG model with other relevant models. The main experimental parameters of our hybrid model are given in Appendix A. The experiments were divided into three parts:

In Part I, the models with HI were compared to those without HI to demonstrate the potential efficacy of HI and the performance improvements attainable by using HI in load power forecasting;

Part II compared four commonly used intelligent models running with HI (namely, HI-TCN, HI-ELM, HI-GRU, and HI-BPNN) to demonstrate the superiority of HI-TCN, HI-ELM, and HI-GRU in different datasets. Furthermore, HI-Q-TEG was compared with HI-TCN, HI-ELM, and HI-GRU to demonstrate the effectiveness of the Q-Learning ensemble method;

Part III aimed to verify the advantages of the decomposition method by comparing the results of the HI-Q-TE method with those obtained using the HI-CEEMDAN-Q-TE decomposition algorithm. In addition, different decomposition algorithms were compared to show the superiority of the CEEMDAN decomposition algorithm proposed in this study.

3.3.1. Experimental Results of Part I

In this part, we investigate the impact of employing HI in load power forecasting. Figure 3 depicts the outlier points and the dissimilarity between the original power load data and the data after HI. Table 2 displays the sample entropy (SampEn) values for both the original load power data and the data post HI application. To further investigate the potential gains from HI, the accuracy of HI-based models is compared to that of models sans HI, and we present the percentage enhancements in all three performance evaluation indices in Table 3.

Based on the results presented in Figure 3 and Table 2 and Table 3, this study draws the following conclusions:

The application of the HI model leads to the identification and correction of outlier points, which improves the overall quality of the dataset. Figure 2 depicts the presence of outlier points in the original power load data, which can interfere with model training and negatively impact forecasting accuracy;
The HI model effectively reduces the complexity of the original data, as evidenced by a lowered value of SampEn. SampEn is a statistical measure that quantifies the complexity of a time series. A lower value of SampEn indicates a higher degree of self-similarity in the sequence, whereas a higher value implies greater complexity. Table 2 indicates that for all three datasets, the values of SampEn were lower in the data processed with the HI model compared to the original load power data;
The HI model improves forecasting accuracy compared to models without the HI model. The comparative analysis of HI-CEEMDAN-Q-TEG with CEEMDAN-Q-TEG shows an improvement in MAPE accuracy by 2.6104%, 3.7628%, and 3.2095%, respectively, for datasets #1, #2, and #3, as listed in Table 3. The improvement is due to the correction of outliers. The findings demonstrate that the implementation of the HI model reduces the load power prediction error in all three series.

3.3.2. Experimental Results of Part II

This part of the experiment compares four commonly utilized single intelligent models (HI-TCN, HI-ELM, HI-GRU, and HI-BPNN) with the HI-Q-TEG method. The MAE values for the four single intelligent models across three datasets are displayed in Figure 4, while Table 4 presents the performance evaluation indexes for all four models. In addition, Figure 5, Figure 6 and Figure 7 provide the forecasting results and errors of HI-Q-TEG, HI-TCN, HI-ELM, and HI-GRU across the three datasets. The effectiveness of the Q-Learning ensemble method is presented in Table 5, which highlights the improvement percentages of each method. Notably, the bolded data within the table represents the model evaluation results that resulted in the lowest forecasting error for the respective dataset.

The findings from Figure 4, Figure 5, Figure 6 and Figure 7 and Table 4 and Table 5 support the following conclusions:

The prediction performance of the same single models varied across different datasets due to varying volatility and nonlinearity, as evidenced by the differing precision orders for the same dataset across different performance evaluation indexes. However, overall, HI-TCN, HI-ELM, and HI-GRU exhibited the best prediction accuracy across three different datasets, respectively, with HI-TCN producing the most accurate predictions for Dataset #1, HI-ELM for Dataset #2, and HI-GRU for Dataset #3. Thus, incorporating the three mentioned single models as base learners for the ensemble method is recommended;
The Q-Learning ensemble algorithm yielded improved forecasting accuracy for load power compared to single intelligent models. Table 5 highlights that comparing HI-Q-TEG with HI-TCN, the MAPE improvement percentages for Dataset #1, Dataset #2, and Dataset #3 are 8.8436%, 5.7540%, and 12.8483%, respectively. Additionally, Figure 4, Figure 5 and Figure 6 display how the Q-Learning ensemble method effectively combines the strengths of various intelligent models and mitigates the negative effect of performance deficiencies in a single model on forecasting accuracy.

3.3.3. Experimental Results of Part III

This part of the experiment compares four decomposition algorithms (WPD, EMD, EEMD, and CEEMDAN) by showcasing their improvement percentages of three performance evaluation indexes for different datasets in Table 6. Additionally, Figure 8, Figure 9 and Figure 10 depict scatter diagram comparisons between the HI-CEEMDAN-Q-TEG method and other decomposition models. The closer the scatter plot points are to the diagonal line, the better the prediction effect of the corresponding model.

From Table 6 and Figure 8, Figure 9 and Figure 10, the following conclusions could be drawn:

When comparing models that utilize decomposition algorithms to those that do not, consistent improvements in percentage can be observed. For instance, comparing HI-CEEMDAN-Q-TEG with HI-Q-TEG, the improvements in the RMSE across datasets #1, #2, and #3 with percentage reductions of 43.74%, 38.65%, and 33.09%, respectively. The use of decomposition algorithms breaks down raw load power data into several frequency components, which, in turn, enhances the performance of recognition for models;
The proposed decomposition model that is based on the CEEMDAN algorithm provides better forecasting outcomes than other decomposition algorithms. For Dataset #2, the improvement percentage of MAE for HI-WPD-Q-TEG, HI-EMD-Q-TEG, HI-EEMD-Q-TEG, and HI-CEEMDAN-Q-TEG is 25.8923%, 20.9483%, 19.9478%, and 38.3934%, respectively. The CEEMDAN algorithm is highly effective at decomposing both high and low-frequency data, allowing for better handling of the high volatility of raw data. This results in optimal performance for forecasting.

4. Conclusions

Load forecasting is crucial for maintaining the stable operation of the power grid. This paper proposes an outlier correction, decomposition, and ensemble reinforcement learning model for load power prediction. The HI-CEEMDAN-Q-TEG model uses the HI outlier correction method to eliminate outliers. The CEEMDAN decomposition method is employed to break down raw load power data into various subseries to reduce volatility. Furthermore, the commonly used reinforcement learning method Q-learning is utilized to generate optimal weights by combining the forecasting results of three single models: TCN, ELM, and GRU. Based on the aforementioned experiments, some conclusions can be drawn as followed:

The utilization of HI significantly improves prediction accuracy. HI detects and eliminates outliers in the original data, reducing their interference in model training, improving its data fitting ability, and ultimately enhancing its forecasting performance;
Using TCN, ELM, and GRU as the base learners confer significant advantages, and the ensemble model employing the Q-learning method yields superior forecasting performance compared to individual base learners. As a type of reinforcement learning method, the Q-learning optimizes the weights of base learners via trial and error within the given environment;
Out of the four decomposition algorithms examined in this study, CEEMDAN exhibited superior forecasting performance. Unlike the other algorithms, CEEMDAN effectively handles non-stationary data and mitigates the impact of unsteady components on forecasting results;
The load power prediction model proposed in this study incorporates several techniques to enhance its accuracy. Firstly, it leverages the use of HI to correct any outliers. Next, it combines the strengths of various intelligent models by employing ensemble reinforcement learning. Additionally, CEEMDAN is adopted to further enhance the prediction results, resulting in exceptional load power prediction performance.

However, there are some limitations to the proposed model in this paper: (a) as a short-term forecasting model, the proposed model is designed to capture immediate changes and it may not be able to capture longer-term trends that develop over weeks, months, or years; and (b) the proposed model is relatively time-consuming when using the CEEMDAN decomposition algorithm. Thus, we intend to construct a parallel computing framework to support the proposed method in future work.

Author Contributions

Conceptualization, J.W., H.L. and G.Z.; methodology, J.W.; software, J.W.; validation, J.W., G.Z. and Y.L.; formal analysis, J.W.; investigation, J.W.; resources, J.W.; data curation, J.W.; writing—original draft preparation, J.W.; writing—review and editing, H.L. and S.Y.; visualization, J.W.; supervision, H.L.; project administration, J.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Nomenclatures

HI	Hampel identifier
CEEMDAN	complete ensemble empirical mode decomposition with adaptive noise
AR	auto regressive
ARMA	auto-regressive moving average
ARIMA	auto-regression integrated moving average
SVM	support vector machine
GRU	gated recurrent unit
TCN	temporal convolutional network
LSTM	long-short-term memory
RNN	recurrent neural networks
CNN	convolutional neural network
EEMD	ensemble empirical mode decomposition
VMD	variational modal decomposition
WPD	wavelet packet decomposition
ELM	extreme learning machine
MAD	median absolute deviation
IMFs	intrinsic mode functions
MAE	mean absolute error
MAPE	mean absolute percentage error
RMSE	root mean square error
Q	Q-Learning algorithm
TEG	TCN, ELM and GRU

Appendix A

The main experimental parameters of our hybrid model are given in Table A1.

Table A1. The main experimental parameters.

Name of Parameter		Value
GRU	Size of input units	10
	Size of hidden units	100
	Size of output units	1
	Learning rate	0.01
TCN	Size of kernel	3
	Skip connection	False
	Batchnorm	False
Q-learning	Maximum iteration	50
	Learning rate	0.95
	Discount parameter	0.5

References

Vanting, N.B.; Ma, Z.; Jørgensen, B.N. A Scoping Review of Deep Neural Networks for Electric Load Forecasting. Energy Inform. 2021, 4, 49. [Google Scholar] [CrossRef]
Gong, R.; Li, X. A Short-Term Load Forecasting Model Based on Crisscross Grey Wolf Optimizer and Dual-Stage Attention Mechanism. Energies 2023, 16, 2878. [Google Scholar] [CrossRef]
Zanib, N.; Batool, M.; Riaz, S.; Afzal, F.; Munawar, S.; Daqqa, I.; Saleem, N. Analysis and Power Quality Improvement in Hybrid Distributed Generation System with Utilization of Unified Power Quality Conditioner. Comput. Model. Eng. Sci. 2023, 134, 1105–1136. [Google Scholar] [CrossRef]
Wang, S.; Zhou, C.; Riaz, S.; Guo, X.; Zaman, H.; Mohammad, A.; Al-Ahmadi, A.A.; Alharbi, Y.M.; Ullah, N. Adaptive Fuzzy-Based Stability Control and Series Impedance Correction for the Grid-Tied Inverter. Math. Biosci. Eng. 2023, 20, 1599–1616. [Google Scholar] [CrossRef]
Li, L.; Guo, L.; Wang, J.; Peng, H. Short-Term Load Forecasting Based on Spiking Neural P Systems. Appl. Sci. 2023, 13, 792. [Google Scholar] [CrossRef]
Ran, P.; Dong, K.; Liu, X.; Wang, J. Short-Term Load Forecasting Based on CEEMDAN and Transformer. Electr. Power Syst. Res. 2023, 214, 108885. [Google Scholar] [CrossRef]
Chen, J.-F.; Wang, W.-M.; Huang, C.-M. Analysis of an Adaptive Time-Series Autoregressive Moving-Average (ARMA) Model for Short-Term Load Forecasting. Electr. Power Syst. Res. 1995, 34, 187–196. [Google Scholar] [CrossRef]
Yildiz, B.; Bilbao, J.I.; Sproul, A.B. A Review and Analysis of Regression and Machine Learning Models on Commercial Building Electricity Load Forecasting. Renew. Sustain. Energy Rev. 2017, 73, 1104–1122. [Google Scholar] [CrossRef]
Shi, H.; Xu, M.; Li, R. Deep Learning for Household Load Forecasting—A Novel Pooling Deep RNN. IEEE Trans. Smart Grid 2018, 9, 5271–5280. [Google Scholar] [CrossRef]
Chen, Y.; Tan, H. Short-Term Prediction of Electric Demand in Building Sector via Hybrid Support Vector Regression. Appl. Energy 2017, 204, 1363–1374. [Google Scholar] [CrossRef]
Hong, W.-C. Electric Load Forecasting by Support Vector Model. Appl. Math. Model. 2009, 33, 2444–2454. [Google Scholar] [CrossRef]
Fan, S.; Chen, L.; Lee, W.-J. Machine Learning Based Switching Model for Electricity Load Forecasting. Energy Convers. Manag. 2008, 49, 1331–1344. [Google Scholar] [CrossRef]
Pan, C.; Tan, J.; Feng, D. Prediction Intervals Estimation of Solar Generation Based on Gated Recurrent Unit and Kernel Density Estimation. Neurocomputing 2021, 453, 552–562. [Google Scholar] [CrossRef]
Wang, Y.; Chen, J.; Chen, X.; Zeng, X.; Kong, Y.; Sun, S.; Guo, Y.; Liu, Y. Short-Term Load Forecasting for Industrial Customers Based on TCN-LightGBM. IEEE Trans. Power Syst. 2021, 36, 1984–1997. [Google Scholar] [CrossRef]
Kong, W.; Dong, Z.Y.; Jia, Y.; Hill, D.J.; Xu, Y.; Zhang, Y. Short-Term Residential Load Forecasting Based on LSTM Recurrent Neural Network. IEEE Trans. Smart Grid 2019, 10, 841–851. [Google Scholar] [CrossRef]
Huang, K.; Hallinan, K.P.; Lou, R.; Alanezi, A.; Alshatshati, S.; Sun, Q. Self-Learning Algorithm to Predict Indoor Temperature and Cooling Demand from Smart WiFi Thermostat in a Residential Building. Sustainability 2020, 12, 7110. [Google Scholar] [CrossRef]
Wang, Y.; Liu, M.; Bao, Z.; Zhang, S. Short-Term Load Forecasting with Multi-Source Data Using Gated Recurrent Unit Neural Networks. Energies 2018, 11, 1138. [Google Scholar] [CrossRef]
Cai, C.; Li, Y.; Su, Z.; Zhu, T.; He, Y. Short-Term Electrical Load Forecasting Based on VMD and GRU-TCN Hybrid Network. Appl. Sci. 2022, 12, 6647. [Google Scholar] [CrossRef]
Imani, M. Electrical Load-Temperature CNN for Residential Load Forecasting. Energy 2021, 227, 120480. [Google Scholar] [CrossRef]
Song, J.; Xue, G.; Pan, X.; Ma, Y.; Li, H. Hourly Heat Load Prediction Model Based on Temporal Convolutional Neural Network. IEEE Access 2020, 8, 16726–16741. [Google Scholar] [CrossRef]
Yue, W.; Liu, Q.; Ruan, Y.; Qian, F.; Meng, H. A Prediction Approach with Mode Decomposition-Recombination Technique for Short-Term Load Forecasting. Sustain. Cities Soc. 2022, 85, 104034. [Google Scholar] [CrossRef]
Li, K.; Huang, W.; Hu, G.; Li, J. Ultra-Short Term Power Load Forecasting Based on CEEMDAN-SE and LSTM Neural Network. Energy Build. 2023, 279, 112666. [Google Scholar] [CrossRef]
Habbak, H.; Mahmoud, M.; Metwally, K.; Fouda, M.M.; Ibrahem, M.I. Load Forecasting Techniques and Their Applications in Smart Grids. Energies 2023, 16, 1480. [Google Scholar] [CrossRef]
Sun, W.L. The Short-Term Load Forecasting Method Based on EEMD and ANN by Considering Grid-Connected Wind Power. Master’s Thesis, Southwest Jiaotong University, Chengdu, China, 2013. [Google Scholar]
Hui, L.; Houjun, L.; Yuwei, L.; Qixiao, Z. Power Load Forecasting Method Based on VMD and GWO-SVR. Mod. Electron. Technol. 2020, 43, 167–172. [Google Scholar]
Karijadi, I.; Chou, S.-Y. A Hybrid RF-LSTM Based on CEEMDAN for Improving the Accuracy of Building Energy Consumption Prediction. Energy Build. 2022, 259, 111908. [Google Scholar] [CrossRef]
Wang, L.; Mao, S.; Wilamowski, B.M.; Nelms, R.M. Ensemble Learning for Load Forecasting. IEEE Trans. Green Commun. 2020, 4, 616–628. [Google Scholar] [CrossRef]
Wang, Y.; Chen, Q.; Sun, M.; Kang, C.; Xia, Q. An Ensemble Forecasting Method for the Aggregated Load With Subprofiles. IEEE Trans. Smart Grid 2018, 9, 3906–3908. [Google Scholar] [CrossRef]
Moon, J.; Jung, S.; Rew, J.; Rho, S.; Hwang, E. Combination of Short-Term Load Forecasting Models Based on a Stacking Ensemble Approach. Energy Build. 2020, 216, 109921. [Google Scholar] [CrossRef]
Massaoudi, M.; Refaat, S.S.; Chihi, I.; Trabelsi, M.; Oueslati, F.S.; Abu-Rub, H. A Novel Stacked Generalization Ensemble-Based Hybrid LGBM-XGB-MLP Model for Short-Term Load Forecasting. Energy 2021, 214, 118874. [Google Scholar] [CrossRef]
Bento, P.M.R.; Pombo, J.A.N.; Calado, M.R.A.; Mariano, S.J.P.S. Stacking Ensemble Methodology Using Deep Learning and ARIMA Models for Short-Term Load Forecasting. Energies 2021, 14, 7378. [Google Scholar] [CrossRef]
Liu, H.; Xu, Y.; Chen, C. Improved Pollution Forecasting Hybrid Algorithms Based on the Ensemble Method. Appl. Math. Model. 2019, 73, 473–486. [Google Scholar] [CrossRef]
Liu, H.; Yu, C.; Wu, H.; Duan, Z.; Yan, G. A New Hybrid Ensemble Deep Reinforcement Learning Model for Wind Speed Short Term Forecasting. Energy 2020, 202, 117794. [Google Scholar] [CrossRef]
Chen, C.; Liu, H. Dynamic Ensemble Wind Speed Prediction Model Based on Hybrid Deep Reinforcement Learning. Adv. Eng. Inf. 2021, 48, 101290. [Google Scholar] [CrossRef]
Liu, H.; Shah, S.; Jiang, W. On-Line Outlier Detection and Data Cleaning. Comput. Chem. Eng. 2004, 28, 1635–1647. [Google Scholar] [CrossRef]
Pearson, R.K. Outliers in Process Modeling and Identification. IEEE Trans. Control. Syst. Technol. 2002, 10, 55–63. [Google Scholar] [CrossRef] [PubMed]
Wang, J.; Du, P.; Hao, Y.; Ma, X.; Niu, T.; Yang, W. An Innovative Hybrid Model Based on Outlier Detection and Correction Algorithm and Heuristic Intelligent Optimization Algorithm for Daily Air Quality Index Forecasting. J. Environ. Manag. 2020, 255, 109855. [Google Scholar] [CrossRef]
Torres, M.E.; Colominas, M.A.; Schlotthauer, G.; Flandrin, P. A Complete Ensemble Empirical Mode Decomposition with Adaptive Noise. In Proceedings of the 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 22–27 May 2011; pp. 4144–4147. [Google Scholar]
Chen, Z.; Jin, T.; Zheng, X.; Liu, Y.; Zhuang, Z.; Mohamed, M.A. An Innovative Method-Based CEEMDAN–IGWO–GRU Hybrid Algorithm for Short-Term Load Forecasting. Electr. Eng. 2022, 104, 3137–3156. [Google Scholar] [CrossRef]
Huang, S.; Zhang, J.; He, Y.; Fu, X.; Fan, L.; Yao, G.; Wen, Y. Short-Term Load Forecasting Based on the CEEMDAN-Sample Entropy-BPNN-Transformer. Energies 2022, 15, 3659. [Google Scholar] [CrossRef]
Bai, S.; Kolter, J.Z.; Koltun, V. An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling. arXiv 2018, arXiv:1803.01271. [Google Scholar]
Zhu, Q.-Y.; Qin, A.K.; Suganthan, P.N.; Huang, G.-B. Evolutionary Extreme Learning Machine. Pattern Recognit. 2005, 38, 1759–1763. [Google Scholar] [CrossRef]
Huang, G.-B.; Zhu, Q.-Y.; Siew, C.-K. Extreme Learning Machine: Theory and Applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar]
Wiering, M.A.; Van Otterlo, M. Reinforcement Learning. Adapt. Learn. Optim. 2012, 12, 729. [Google Scholar]
Szepesvari, C. Algorithms for Reinforcement Learning: Synthesis Lectures on Artificial Intelligence and Machine Learning; Morgan Claypool: San Rafael, CA, USA, 2010. [Google Scholar]
Watkins, C.J.C.H.; Dayan, P. Q-Learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]
Pecan Street Data. Available online: https://www.pecanstreet.org/dataport (accessed on 12 December 2022).

Figure 1. The framework of the proposed model.

Figure 2. Three original load power datasets.

Figure 3. The HI results of three load power datasets.

Figure 4. The MAE of different single intelligent models.

Figure 5. The forecasting results and errors for Dataset #1: (a) predicted results; (b) error distribution.

Figure 6. The forecasting results and errors for Dataset #2: (a) predicted results; (b) error distribution.

Figure 7. The forecasting results and errors for Dataset #3: (a) predicted results; (b) error distribution.

Figure 8. The prediction comparison results with other decomposition algorithms in Dataset #1.

Figure 9. The prediction comparison results with other decomposition algorithms in Dataset #2.

Figure 10. The prediction comparison results with other decomposition algorithms in Dataset #3.

Table 1. The statistical characteristics of three power consumption datasets [48].

Dataset	Minimum (kW)	Maximum (kW)	Mean (kW)	Standard Deviation (kW)
#1	133.1340	827.8200	373.6066	121.3303
#2	161.3510	976.8730	351.2301	136.5601
#3	200.5950	1653.8300	677.7043	288.2719

Table 2. The SampEn of load power data before HI and data after HI.

SampEn	Dataset #1	Dataset #2	Dataset #3
Data before HI	6.2364	7.0211	6.1759
Data after HI	6.2275	7.0139	6.1654

Table 3. The improvement by using HI.

Dataset	Model	P_MAE (%)	P_MAPE (%)	P_RMSE (%)
#1	HI-TCN vs. TCN	4.5966	4.0256	5.3476
	HI-ELM vs. ELM	6.8284	7.0817	8.2709
	HI-GRU vs. GRU	5.3423	4.3212	4.9438
	HI-Q-TEG vs. Q-TEG	4.9299	5.0250	6.2904
	HI-CEEMDAN-Q-TEG vs. CEEMDAN-Q-TEG	3.4634	2.6104	3.5044
#2	HI-TCN vs. TCN	2.5698	3.9567	3.0261
	HI-ELM vs. ELM	3.3538	2.5823	0.9395
	HI-GRU vs. GRU	3.3243	4.9023	4.0294
	HI-Q-TEG vs. Q-TEG	4.0959	3.9908	2.9216
	HI-CEEMDAN-Q-TEG vs. CEEMDAN-Q-TEG	7.6677	3.7628	0.7874
#3	HI-TCN vs. TCN	2.9335	4.8224	5.5556
	HI-ELM vs. ELM	0.7656	3.9408	3.8763
	HI-GRU vs. GRU	1.3948	2.7834	3.9056
	HI-Q-TEG vs. Q-TEG	3.0153	4.4914	5.2185
	HI-CEEMDAN-Q-TEG vs. CEEMDAN-Q-TEG	2.7840	3.2095	4.2586

Table 4. The evaluation of forecasting results for the single intelligent models ¹.

Dataset	Model	MAE (kW)	MAPE (%)	RMSE (kW)
#1	HI-TCN	20.3184	12.1070	33.2972
	HI-ELM	23.4916	12.4653	35.7800
	HI-GRU	22.9000	13.1992	34.5907
	HI-BPNN	24.2484	15.5232	34.9866
#2	HI-TCN	11.6604	9.5715	25.5669
	HI-ELM	10.2753	8.4323	24.5534
	HI-GRU	10.4699	8.7681	24.7721
	HI-BPNN	12.9091	13.5991	27.6510
#3	HI-TCN	31.1954	13.1659	47.4673
	HI-ELM	30.5709	12.9044	48.3280
	HI-GRU	29.4231	12.6587	47.1891
	HI-BPNN	34.6515	13.7168	51.9016

¹ The values in bold represents the model evaluation results that resulted in the lowest forecasting error.

Table 5. The improvement of the Q-Learning ensemble method.

Dataset	Model	P_MAE (%)	P_MAPE (%)	P_RMSE (%)
#1	HI-Q-TEG vs. HI-TCN	4.3723	8.8436	0.7334
	HI-Q-TEG vs. HI-ELM	6.9519	11.4638	3.4204
	HI-Q-TEG vs. HI-GRU	5.6687	9.5325	1.2771
#2	HI-Q-TEG vs. HI-TCN	4.8393	5.7540	2.2847
	HI-Q-TEG vs. HI-ELM	3.6675	4.3632	2.3143
	HI-Q-TEG vs. HI-GRU	4.2632	5.7205	2.7620
#3	HI-Q-TEG vs. HI-TCN	4.1633	12.8483	2.6601
	HI-Q-TEG vs. HI-ELM	3.3152	11.0823	3.6086
	HI-Q-TEG vs. HI-GRU	1.7167	9.3564	2.3495

Table 6. The improvement percentages of different decomposition algorithms.

Dataset	Model	P_MAE (%)	P_MAPE (%)	P_RMSE (%)
#1	HI-WPD-Q-TEG vs. HI-Q-TEG	47.3353	65.2986	39.5611
	HI-EMD-Q-TEG vs. HI-Q-TEG	26.5890	50.7157	18.6278
	HI-EEMD-Q-TEG vs. HI-Q-TEG	35.4821	57.8237	25.1166
	HI-CEEMDAN-Q-TEG vs. HI-Q-TEG	52.9294	68.9604	43.7355
#2	HI-WPD-Q-TEG vs. HI-Q-TEG	25.8923	57.3398	30.6425
	HI-EMD-Q-TEG vs. HI-Q-TEG	20.9483	55.8583	22.1706
	HI-EEMD-Q-TEG vs. HI-Q-TEG	19.9478	53.8074	23.2814
	HI-CEEMDAN-Q-TEG vs. HI-Q-TEG	38.3934	65.1822	38.6528
#3	HI-WPD-Q-TEG vs. HI-Q-TEG	31.1852	30.2364	25.3324
	HI-EMD-Q-TEG vs. HI-Q-TEG	26.2456	25.6166	24.6462
	HI-EEMD-Q-TEG vs. HI-Q-TEG	34.5388	26.2974	30.2950
	HI-CEEMDAN-Q-TEG vs. HI-Q-TEG	36.0141	32.7952	33.0906

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, J.; Liu, H.; Zheng, G.; Li, Y.; Yin, S. Short-Term Load Forecasting Based on Outlier Correction, Decomposition, and Ensemble Reinforcement Learning. Energies 2023, 16, 4401. https://doi.org/10.3390/en16114401

AMA Style

Wang J, Liu H, Zheng G, Li Y, Yin S. Short-Term Load Forecasting Based on Outlier Correction, Decomposition, and Ensemble Reinforcement Learning. Energies. 2023; 16(11):4401. https://doi.org/10.3390/en16114401

Chicago/Turabian Style

Wang, Jiakang, Hui Liu, Guangji Zheng, Ye Li, and Shi Yin. 2023. "Short-Term Load Forecasting Based on Outlier Correction, Decomposition, and Ensemble Reinforcement Learning" Energies 16, no. 11: 4401. https://doi.org/10.3390/en16114401

APA Style

Wang, J., Liu, H., Zheng, G., Li, Y., & Yin, S. (2023). Short-Term Load Forecasting Based on Outlier Correction, Decomposition, and Ensemble Reinforcement Learning. Energies, 16(11), 4401. https://doi.org/10.3390/en16114401

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Short-Term Load Forecasting Based on Outlier Correction, Decomposition, and Ensemble Reinforcement Learning

Abstract

1. Introduction

2. Methodology

2.1. Framework of the Proposed Model

2.2. Hampel Identifier

2.3. Complete Ensemble Empirical Mode Decomposition with Adaptive Noise

2.4. Base Learners

2.4.1. Temporal Convolutional Network

2.4.2. Extreme Learning Machine

2.4.3. Gate Recurrent Unit

2.5. Ensemble Reinforcement Learning Method

3. Case Study

3.1. Data Description

3.2. Performance Evaluation Indexes

3.3. Forecasting Results and Analysis

3.3.1. Experimental Results of Part I

3.3.2. Experimental Results of Part II

3.3.3. Experimental Results of Part III

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Nomenclatures

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI