Next Article in Journal
Uncertainty Analysis of Weather Forecast Data for Cooling Load Forecasting Based on the Monte Carlo Method
Next Article in Special Issue
Optimal Micro-PMU Placement Using Mutual Information Theory in Distribution Networks
Previous Article in Journal
Unit Commitment of a Power System Including Battery Swap Stations Under a Low-Carbon Economy
Previous Article in Special Issue
Distributed Energy Sharing for PVT-HP Prosumers in Community Energy Internet: A Consensus Approach
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Modular Predictor for Day-Ahead Load Forecasting and Feature Selection for Different Hours

1
College of Information and Control Engineering, Jilin Institute of Chemical Technology, Jilin 132022, China
2
School of Electrical Engineering, Northeast Electric Power University, Jilin 132013, China
3
Zhejiang Electric Power Corporation Wenzhou Power Supply Company, Wenzhou 325000, China
*
Author to whom correspondence should be addressed.
Energies 2018, 11(7), 1899; https://doi.org/10.3390/en11071899
Submission received: 25 June 2018 / Revised: 10 July 2018 / Accepted: 12 July 2018 / Published: 20 July 2018
(This article belongs to the Special Issue Optimization Methods Applied to Power Systems)

Abstract

:
To improve the accuracy of the day-ahead load forecasting predictions of a single model, a novel modular parallel forecasting model with feature selection was proposed. First, load features were extracted from a historic load with a horizon from the previous 24 h to the previous 168 h considering the calendar feature. Second, a feature selection combined with a predictor process was carried out to select the optimal feature for building a reliable predictor with respect to each hour. The final modular model consisted of 24 predictors with a respective optimal feature subset for day-ahead load forecasting. New England and Singapore load data were used to evaluate the effectiveness of the proposed method. The results indicated that the accuracy of the proposed modular model was higher than that of the traditional method. Furthermore, conducting a feature selection step when building a predictor improved the accuracy of load forecasting.

1. Introduction

The main idea of short-term load forecasting (STLF) is to predict future loads with horizons of a few hours to several days. Accurate STLF predictions play a vital role in electrical department load dispatch, unit commitment, and electricity market trading [1]. With the permeation of renewable resources in grids and the technological innovation of electric vehicles, load components become more complex and make STLF difficult; therefore, strict requirements of stability and accuracy are needed [2,3,4,5,6].
STLF is an old but worthy theme for research. General forecasting methods can be divided into two branches: the statistical method and the artificial intelligence method. Statistical methods such as regression analysis, exponential smoothing, Kalman filter, and autoregressive integrated moving average (ARIMA) are easy to apply but modeling is difficult for complex loads [7,8,9]. Artificial intelligence methods show better performance than statistical methods in load forecasting and include fuzzy logic, the artificial neural network (ANN), the support vector machine (SVM), Gaussian process regression (GPR), and random forest (RF) [10,11,12,13,14,15,16,17]. The relationship of input and output is confirmed by a list of rules by fuzzy logic. However, the prior knowledge required to select the parameters in the membership function and the rules makes the modeling process complex [18]. The artificial neural network method is applied to the STLF of power systems owing to its self-learning ability and robustness to data noise. However, shortcomings such as the difficulty in determining initial network parameters and over-fitting still exist [19]. By adopting a structural risk minimization principle, the complexity and the learning ability of an SVM can be balanced. With low-dimension conditions and few samples, the SVM can maintain its generalization ability. Compared to the artificial neural network, the SVM has many advantages. The parameters of the SVM should be determined through a computational optimization by algorithm such as the genetic algorithm or the particle swarm optimization algorithm [20,21]. GPR is a kernel-function-based algorithm whose transcendental function is established in the form of probability distribution, and the posterior function can be acquired by Bayesian logic. The parameter of kernel function in GPR is obtained automatically in the process of training [22]. RF is a type of integrated machine-learning algorithm based on a decision tree. The main advantages of RF are immunity to noise and insensitivity to its parameters [23].
In addition to the forecasting method, input feature selection is a vital factor that influences the accuracy and efficiency of load forecasting. A model using a few features has difficulty analyzing the effect of external conditions on the load. However, as the complexity of a model increases, the accuracy and efficiency will be influenced. Feature selection is a process of selecting a subset of variables from an original high-dimensionality variable set that retains the most efficient variables while reducing the effects of the irrelevant variables [24]. Feature selection methods can be classified as wrapper, filter, and embedded [25]. In the wrapper method, the performance of a predictor is chosen as the criterion for feature selection. An exhaustive search is performed to identify the optimal feature subset from numerous combinations of features at which the predictor performs best. However, the wrapper method needs to evaluate 2N subsets which leads to an NP-hard problem with too many features [26]. Therefore, evolutionary algorithms such as the memetic algorithm [27], the genetic algorithm [28], and the particle swarm optimization algorithm [29] can reduce the complexity of computation. Filter methods, such as mutual information (MI) and RreliefF, are ranking methods that evaluate features by analyzing the relationship between the inputs and outputs and a feature score or weight is given to each feature for ranking. To acquire an optimal feature subset, the accuracy of the predictor is used as the criterion [30]. Compared to wrapper methods, filter methods do not rely on other learning algorithms and the computational cost is light [31,32,33]. Embedded methods, such as the classification and regression tree (CART) and RF, which combine feature selection with a learning algorithm, analyze and compute the importance value of features in a training process [25]. Experiments need to be performed according to a specific forecasting case that considers the advantages and disadvantages of different kinds of feature selection methods, the size of training sets, and the performance of a predictor to determine the most-accurate forecasting method.
Although the performance of a predictor can be provided by feature selection, it should be noted that the load time series presents a day-cycle characteristic, which means the load characteristics at the same time on different days are similar [34]. In addition, the load at different hours of a day is affected by consumption behavior and leads to significantly different feature responses. A single predictor with a feature selection for forecasting all future load periods may not reach the load requirement of different hours, and the accuracy of the total forecast result will decrease. Therefore, a modular model that consists of several single predictors used for forecasting the load of different hours is needed. The relation of the load at different hours to be forecast and a feature could be analyzed by a modular predictor with a feature selection for a specific hour of load, and thus the accuracy can be improved [35]. In addition, in electric power dispatching, for different electric power departments, the demand of the time of submission of the STLF result is different. Therefore, when constructing a candidate feature set for STLF, the time factor should be considered.
Considering the construction of a feature set, feature selection, and modeling objects, a novel modular parallel forecasting model with feature selection for day-ahead load forecasting was proposed. First, to meet the requirement of the dispatch department and electricity market, the load time series which records every hour according to different forecasting moments was reconstructed to a different load sub-time series. Second, the candidate feature set included 173 features extracted from historic load and calendar. Then, five feature selection methods—MI, conditional mutual information (CMI), RreliefF, CART, and RF—were used to analyze the importance between each feature and different prediction targets and to rank the features in descending order. Third, combined with various predictors, the sequential forward-selection algorithm and a decision criterion based on the mean absolute percentage error (MAPE) were utilized to obtain optimal feature subsets corresponding to different prediction targets. Finally, the optimal modular predictor including several optimal sub-predictors with optimal feature subsets for different forecasting periods was built. The optimal combination method was determined by comparing the forecast results. The proposed method was tested through a day-ahead load forecasting experiment using actual load data from New England and Singapore.

2. Feature Selection

The input feature (variable), as one of the key factors in a predictor build, has a significant influence on the accuracy of the predictor in day-ahead load forecasting. In this study, the filter method and embedded method were adopted for feature selection before building the predictor.

2.1. Filter Method of Feature Selection

The filter method is a feature ranking method that computes a feature’s numerical value to evaluate its importance. Therefore, the estimation of a feature is important to the feature selection result. MI, CMI, and RreliefF methods were used as filters in this study.

2.1.1. Mutual Information

The Mutual Information (MI) method measures the common information between two random variables. For two random variables X and Y, the MI between X and Y can be estimated as:
I ( X , Y ) = X , Y P ( x , y ) log P ( x , y ) P ( x ) P ( y )
where P(x) and P(y) are the marginal density functions corresponding to X and Y, respectively. P(x,y) is the joint probability density function. In load forecasting, the feature is defined as X, the target variable is defined as Y, and I(X,Y) represents their strength of relevance. The larger I(X,Y) is, the more dependent X is. If I(X,Y) is zero, X and Y are independent. The MI method can measure the relevance between a feature and a target variable effectively; however, the redundancy is analyzed differently.

2.1.2. Conditional Mutual Information

The Conditional Mutual Information (CMI) method measures the relevance of two variables when the variable Z is known. In the feature selection of load forecasting, let us suppose the selected feature set is S and the CMI between feature Xi and target Y is defined as:
I ( Y ; X i | S ) = I ( Y ; S | X i ) I ( Y ; S )
where I(Y;Xi|S) represents the new information that Xi supplies to S. The larger I(Y;Xi|S) is, the more information Xi can supply, and the less is the redundancy to S. Compared to the MI method, the redundancy among features can be reduced by CMI.

2.1.3. RreliefF

RreliefF is the extended version of relief for regression [36]. By evaluating the feature weight, the feature quality is measured. Relief works by randomly selecting an instance and searching the nearest neighbor from the same class and from a different class. The weight W[Xi] of feature Xi estimated by relief is an approximation of the difference of probabilities:
W [ X i ] = P ( d i f f ,   v a l u e   o f   X i | n e a r e s t   i n s t .   f r o m   d i f f .   c l a s s ) P ( d i f f ,   v a l u e   o f   X i | n e a r e s t   i n s t .   f r o m   s a m e .   c l a s s )
For RreliefF, the probability of two instances belonging to different classes can be evaluated by their relative distances for classification. However, for STLF, the predicted value is continuous; therefore, Equation (3) should be reformulated. By using Bayes’ theorem, W[Xi] can be obtained as:
W [ X i ] = P d i f f C | d i f f X i P d i f f X i P d i f f C ( 1 P d i f f C | d i f f X i ) P d i f f X i 1 P d i f f C

2.2. Embedded Method for Feature Selection

In the embedded method, feature selection is performed during the training process where the contribution of the feature combination is efficiently evaluated. The embedded method can be directly applied to STLF and can collaborate with other feature selection methods according to their estimated importance.

2.2.1. Classification and Regression Tree

The Classification and Regression Tree (CART) method uses a binary recursive partitioning algorithm [37]. By splitting the current samples into two sub-samples, a father node generates two child nodes. The final model of CART is a simple binary tree.
The generation of the CART can be divided into two steps:
Step one: first, the root node is split. A best feature Xbset chosen from the feature set serves as the criterion for node splitting. To select the best feature, the minimum variance of child nodes is the objective function. The variance of the child node of Xi is defined as:
var ( q ) = X i q ( y i y ¯ q ) 2
where y ¯ q is the average of observation values yi at node q. The importance of feature Xi according to the variance is defined as:
V C ( X i ) = 1 X i q ( y i y ¯ q ) 2
Step two: for each child node, repeat Step one until the CART grows completely. The predictive model can be expressed as t(x, T), where T = (xi, yi), i = 1,2,…,n and xR is the training set. For STLF, the forecasting value of load y ^ is obtained when inputting the new x ^ .
y ^ = t ( x ^ , T )

2.2.2. Random Forest

Random Forest (RF) is a machine-learning algorithm that uses a combination of CART with a bootstrap sample for classification and regression [38]. For a training set T with n samples, the bootstrap sample means randomly selecting n samples from T replacements. The probability that each sample selected is 1/n, means one sample may appear several times. After a complete bootstrap sample, the samples that were not sampled form the out-of-bag (OOB) dataset. Different from CART, the feature for node splitting in RF is selected from m features which are chosen from the original feature set. The basis of selecting the best feature for node splitting is Equation (5). The predictive output of RF is obtained by averaging the results of the trees:
y ^ = 1 N t i = 1 N t t ( x ^ , T i )
where Nt is the number of trees.
In addition, the OOB error and the importance of each feature are computed in the process of modeling. Each tree has an OOB dataset, and the OOB error is evaluated by predicting the OOB dataset using the tree model corresponding to the OOB dataset. The OOB error is defined as:
e = 1 N t i = 1 N t ( y i y ^ i ) 2
A feature’s importance is estimated by permutating the feature and averaging the difference of OOB errors before and after the permutation of all trees. For instance, for the ith tree whose OOB data is OOBi and OOB error is e i , after permutation, the new OOB data will be O O B i and the OOB error will be e i . The feature’s importance in this tree is computed as:
V I i = e i e i

3. The Short-Term Load Forecasting (STLF) Predictor

Selecting an appropriate predictor is key to improving the accuracy of STLF. Five state-of-the-art predictors were applied in this study: support vector regression (SVR), back-propagation neural network (BPNN), CART, GPR, and RF. The SVR, BPNN, and GPR are introduced briefly in this section. The detailed mathematical theories of these algorithms are shown in the references [39,40,41].

3.1. Support Vector Regression

By using the non-sensitive loss function, an Support Vector Regression (SVM), which is used only for classification, is extended for regression to be applied for load forecasting in power systems and is called support vector regression (SVR).
Given a training set T, the model for the load that decreases the difference between the predictive value f(x) and the true load y as much as possible is expected to be:
f ( x ) = ω T x + b
In SVR, the maximum difference that can be tolerated between f(x) and y is ε . The mathematical model can be expressed as:
{ max α , α [ 1 2 i = 1 n j = 1 n ( α i α i ) ( α j α j ) K ( x i , x j ) j = 1 n ( α i + α i ) ε + i = 1 n ( α i α i ) y i ] s . t . { i = 1 n ( α i α i ) = 0 0 α i , α i C
where C is the regularization parameter, K ( x i , x j ) = φ ( x i ) φ ( x j ) is the kernel function, and α i , α i are Lagrange factors.
The radial basis function selected in this study is expressed as:
K ( x i , x j ) = exp ( x i x j 2 2 σ 2 )
where σ 2 is the kernel width.
The SVR model is obtained by solving Equation (12):
f ( x ) = i = 1 n ( α i α i ) K ( x i , x ) + b
where b is the bias value.

3.2. Back-Propagation Neural Network

The Back-Propagation Neural Network (BPNN) is a type of artificial neural network consisting of an input layer, a hidden layer, and an outer layer trained by a back-propagation algorithm with the mean squared error (MSE) as the objective function. The main idea of the BPNN is to deliver the output-layer error from back to front by which the error of the hidden layer is computed. The learning process of BPNN is divided into two steps:
Step 1: The output of each neural unit in the input and hidden layers is estimated.
Step 2: By using the output error, the error of each neural unit which is used for updating the former layer weight is computed.
The objective function of the gradient minimization is based on:
e f = 1 2 i ( y i y ^ i ) 2
where y i is the actual value of neural unit i and y ^ i is the predictive value. To compute the minimum value of e f , a modification value is needed to correct the weight. The modification value is defined as:
Δ w i j ( t ) = η e w i j + α Δ w i j ( t 1 ) = η e n e t i n e t i w i j + α Δ w i j ( t 1 ) = η δ i O j + α Δ w i j ( t 1 )
n e t i = j w i j O j
O i = 1 1 + e n e t i
where η is the learning rate, neti is the input of neuron i, Oi is the output of neuron i, and α is the momentum factor.
The modified weight is:
w i j ( t + 1 ) = w i j ( t ) + Δ w i j
The final output y ^ i of neuron i can be estimated by the iteration of weight w i j when meeting precision requirements.

3.3. Gaussian Process Regression (GPR)

Gaussian Process Regression (GPR) is a random process in which the random variables obey the Gaussian distribution and is used to establish the input and output maps. For STLF, the load data collected is polluted by noise. Assuming that the noise follows a normal distribution ε ~ N ( 0 , σ n 2 ) , then the joint prior distribution of observation y and the predictive value f are defined as:
[ y f ] ~ N ( 0 , [ K ( X , X ) + σ n 2 I n K ( X , x ) K ( x , X ) k ( x , x ) ] )
where n is the number of training samples, K ( X , X ) is the covariance matrix, and I n is the unit matrix.
The posterior distribution of f is defined as:
f | X , y , x ~ N [ f ¯ , cov ( f ) ]
where f ¯ is the mean value of f and cov ( f ) is the variance of f .
Then, f ¯ and cov ( f ) can be computed as:
{ f ¯ = K ( x , X ) [ K ( X , X ) + σ n 2 I n ] 1 y cov ( f ) = k ( x , x ) K ( x , X ) × [ K ( X , X ) + σ n 2 I n ] 1 K ( X , x )
The covariance function of GPR is the squared exponential function:
k ( x , x ) = σ f 2 e x p [ 1 2 ( x x ) M 1 ( x x ) ]
where θ = { M , σ f 2 , σ n 2 } is a hyper-parameter that can be solved by the maximum likelihood method [41].

4. Data Analysis

4.1. Load Analysis

Affected by different factors, load sequence appears as a type of complicated non-linear time series. To construct a reasonable original feature set and achieve better forecasting for a region, the load characteristics and other factors should be analyzed.
Figure 1 shows the power load of New England in different time lengths. By observing Figure 1a,b, the load patterns from 2011 to 2013 are similar. Influenced by climate, load patterns differ by season. In Figure 1c, the load curves of two continuous weeks in four seasons are presented (the first day is Monday). It is easy to see that the weekday and weekend load demands differ, and the load demand presented a cycling mode with a period of seven days. The Tuesday load curves of the different seasons shown in Figure 1d shows that the Tuesday load pattern of different weeks is similar. The load increased rapidly from 6:00 am to 11:00 am, which corresponds to the beginning of work, and reached the first peak load. The second peak load occurred from 19:00 pm to 20:00 pm.
As analyzed above, the load characteristics can be summarized as
(1) The same-day load patterns are similar and represent the week-cycle of the load.
(2) The weekday and weekend load patterns were similar respectively and represent the day-cycle of the load.

4.2. Candidate Feature Set

An appropriate feature set plays a significant role in modeling an uncomplicated but outstanding predictor. However, a candidate feature set that contains sufficient information must be found to ensure that effective features can be selected by the feature selection method. The two main feature types are the endogenous predictor (load feature) and the exogenous predictor (calendar feature).
The time interval before the predictive moment before submission of the dispatch department’s forecasting result should be considered when extracting features. To ensure the universality of the original feature set, we used the interval time p = 24. A feature set consisting of 145 internal historic load features (from lag 24 to lag 168) from a one-week data window was chosen as a part of the candidate feature set. The maximum, minimum, and mean loads were also included. Except for the load feature, calendar features such as hour of day, day type, working day, and non-working day were also considered. The candidate feature set with 173 features was formed as shown in Table 1.
Feature explanation:
Endogenous predictor:
F L ( max , d k ) is the maximum power load k days before, k = 2, 3, 4, 5, 6, 7.
F L ( min , d k ) is the minimum power load k days before, k = 2, 3, 4, 5, 6, 7.
F L ( m ean , d k ) is the average power load k days before, k = 2, 3, 4, 5, 6, 7.
F L ( t i ) is the historic power load i hours before the forecasting hour t, and i = 24, 25, 26, …, 168.
Exogenous predictor:
F D W is the day of week, which is signed by 0 or 1 (W = 1, 2, 3, 4, 5, 6, 7 represents Monday to Sunday).
F W is work day or non-work day (0 is a work day and 1 is a non-work day).
F Hour is the moment of hour (1 to 24).

5. Experimental Setup

5.1. Proposed STLF Process with Feature Selection

Figure 2 provides an overview of the proposed method which covers the construction of the feature set, the dataset separation, and the feature selection for the load with respect to the different hours and the modeling for different hours. Figure 2a shows the one-day structure of a sample. The inputs include 173 features, and the output is the predicted load.
The diagram of the proposed method is displayed in Figure 2b. The training set was separated into 24 training subsets corresponding to each hour. The features in each training subset were ranked in descending order according to their feature scores as computed by the feature selection method. Then, the optimal feature subset was selected using the predictor and the MAPE-based criteria. Finally, the modular predictor was constructed based on 24 predictors with the obtained optimal subsets.
The process of selecting the optimal feature subset in modeling is shown in Figure 2c. According to the ranked feature order, the predictor was used to test the feature subset consisting of the top i features, and the criteria based on the MAPE was used to select the optimal feature subset.

5.2. Dataset Split

The data used in this study were from New England [42] and Singapore [43]. The New England data were recorded every hour from 2011 to 2013 for a total of 26,304 data points. The Singapore data were recorded every half hour from 2014 to 2015 for a total number of 35,040 data points. To apply the proposed method, the hourly load of Singapore was extracted to form a new load time series. The data used for training and testing the predictor consisted of the feature set (173 features) and the predictive object (the load corresponding to different hours) as shown in Figure 2.
Each dataset was split into three parts: a training set (14,616 New England samples, 11,712 Singapore samples), a validation set (2928 New England samples, 2094 Singapore samples), and a test set (8760 New England samples, 2904 Singapore samples). The training and the validation sets were used to build the predictor and to select an optimal feature subset. The test set was used to examine the performance of the feature subset and the predictor. Detailed information about the datasets is shown in Table 2.

5.3. Evaluation Criterion

To evaluate the performance of the proposed method, three criteria, the MAPE, the mean absolute error (MAE), and the root mean square error (RMSE) were used as follows:
M A P E = 1 n i = 1 n | y i y ^ i y i | × 100 %
M A E = 1 n | y i y ^ i |
R M S E = 1 n ( y i y ^ i ) 2
where y i is the actual load, y ^ i is the predictive load, and n is the number of predictive loads.

6. Results

The software used were MATLAB 2016b (Version 9.1.0.441655, Mathworks Inc., Natick, MA, USA) and Rx64 3.3.2 (Version 3.3.2, GUN Project, developed at Bell Laboratories). It is noted that the CART algorithm in the rpart package in R identifies part of the features whose total importance value is 100. The parameter of each predictor was set by:
BPNN: the number of neurons in the hidden layer was Nneu = 2 × Nfeature + 1, iteration T = 1000 [44].
SVR: the regularization parameter C = 1, the non-sensitive loss function ε = 0.1, the kernel width δ2 = 2 [15].
RF: m = Nfeature/3 and NTree = 500 [16,23].
CART: no pruning parameter was set because the tree grows completely.
GPR: the parameter of GPR was tuned by learning the training data.

6.1. Load Forecasting for New England

6.1.1. Feature Selection for Different-Hour Loads

Feature Score for Feature Analysis

Feature selection methods rate the importance of a feature by assigning a numerical value to represent the relation between the feature and the target. For example, the value of a feature computed by MI is called an MI value, while that computed by RF and CART is called its permutation importance. The feature score is used for easy description. Parts of normalized feature score curves computed by different feature selection methods are shown in Figure 3. The feature score curves of typical hours (hour 5, hour 6, hour 10, and hour 11 when the valley and peak loads appear) were chosen for analysis. The feature score curves that used the same feature score calculation method were different at various hours. For example, the MI curves were much different for hour 5, hour 6, hour 10, and hour 11, and the features with the highest scores were different from each other (marked by a red circle).
The feature score shows the importance between the feature and the target variable. When selecting a feature subset, the feature with the highest score should be retained and one with the lowest should be eliminated.
The top 10 features after ranking are shown in Table 3, where it is clear that the top 10 features for the same hour were similar. For example, for hour 5, the same top 10 features were selected by the various methods such as FL(t−24), FL(t−25), FL(t−26), and FL(t−27) and similar features such as FL(t−28), FL(t−29), FL(t−30), and FL(t−31). However, there was an obvious difference in the features of hour 5 and hour 6 which may have been caused by the different load characteristics shown in Figure 1d.
Therefore, a feature analysis for each hour is required to choose the best features for improving the accuracy of STLF.

Optimal Feature Subset Selection Process

According to the trend of feature score curves of diverse feature selection methods, the first 36 to 50 features are chosen as the optimal features for modeling [30]. By analyzing the autocorrelation of the lag variables, 50 features were selected for very-short-term load forecasting [41]. When selecting a feature subset, most studies did not give a specific threshold for selecting the optimal feature subset. In this study, the performance of features which ranked in descending order based on feature score were estimated by the MAPE which was chosen as the threshold for selecting the optimal feature subset by adding features one-by-one to the feature subset.
Figure 4 shows the MAPE curves of different feature selection methods and predictor-based feature selection processes. As shown in each subplot in Figure 4, the MAPE was reduced and reached a minimum value with an increase in the number of features. For example, the MAPE of MI for hour 5 and the MAPEs of BPNN, CART, GPR, RF, and SVR when using the top feature were 4.587%, 4.743%, 4.618%, 5.196%, and 4.718%, respectively. When 20 features were used, the MAPEs were reduced to 3.901%, 4.555%, 4.008%, 4.160%, and 3.831%, respectively. The MAPEs of different predictors decreased in different levels, indicating that the 20 features made a positive contribution to a better prediction model build. A similar conclusion can be summarized by analyzing other curves. The dimension of each optimal feature subset and its MAPE is marked by different colored circles corresponding to different predictors.
The following conclusions can be drawn from Table 3 and Figure 4:
(1) The feature permutation estimated by different feature selection methods varies.
(2) The dimension of the optimal feature subset and its MAPE depends on the predictor-based feature selection method.
(3) The optimal feature subset selected by the same predictor-based feature selection method for the predictive target of different hours is different.
Table 4 shows the MAPE and the dimension of the optimal feature subset corresponding to using MI as the feature selection method and RF as the prediction model. The Table shows that 1 to 6 am, the dimension of optimal feature subset is less than at 7 to 19 pm, as the same as the forecasting error. This is because people are less active at night and there are fewer factors affecting the load than during the day.
The MAPE and the dimension of optimal feature subset corresponding to 1:00 were carried out by different feature selection methods and forecasting methods shown in Table 5. The MAPEs are in 3% to 4% which means the performance of forecasters were similar after feature selection. By analysis of the feature dimension, we could find there is huge difference between the number of the feature of the optimal feature subset that selected by different feature selection methods, which caused by the different evaluation criterions.
The details of the dimension of the optimal feature subset and its MAPE are shown in Appendix A Table A1 to Table A2. Based on a longitudinal comparison, the dimension of optimal feature subsets selected by different feature selection methods with same-hour predictors were different. For instance, the horizon of the hour-2 MAPE calculated by various methods was from 3.107% to 4.050%. The combination RreliefF + SVR method had the smallest MAPE and lower feature subset dimension.
By the horizontal comparison, the dimension of optimal feature subsets selected by the same feature selection method with the same-hour predictor varied. For example, the horizon of dimension of the feature subset corresponding to different hours selected by the RreliefF + SVR method ranged from 13 to 109 and the MAPE range was 3.043% to 4.558%. In addition, the number of features for a night hour was less than the day hour, indicating that the day load components were more complex and more difficult to forecast.
In conclusion, the characteristic of the load to predict for different hours varies; therefore, the load needs a special feature set to build a predictor for special hours. The necessity of using one kind of structure of modular time-scale prediction and feature selection for the load of different hours was verified.

6.1.2. Forecasting Result of Method Combinations with Optimal Feature Subsets for New England Load Data

To test the performance of diverse method combinations with the optimal feature subset, we used a special week for our experiment.
The effect of temperature on the loads in summer and winter is large, and severe fluctuations make accurate forecasting difficult. Therefore, two weeks were chosen randomly from the summer and winter of 2013 for testing. The summer period was from 28 July to 3 August, and the winter period was from 22 to 28 December. As shown in Figure 5, the predictive load of each combined method was fit with the true summer load. The average error of the various methods are shown in Table 6. The top-three combined methods were CART + SVR, RreliefF + RF, and RreliefF + SVR, and the MAPEs were 3.634%, 3.710%, and 4.204%, respectively. The predictive load of each combined method in winter is shown in Figure 6, each of the predicted loads matched the actual load except for Tuesday and Wednesday which corresponded to Christmas day and the day before. As is shown in Table 7, the first three combined methods were RreliefF + SVR, CART + GPR, and CART + SVR, and the MAPEs were 4.207%, 4.754%, and 4.770%, respectively.
For the full verification of different method combinations, the entire test set was used for the contrast experiment. The results of different evaluated criteria for the proposed forecasting approach applied by 25 method combinations are presented for day-ahead load forecasting in Table 8. The forecast errors of the different methods varied. For example, the error of MI-based SVR was close to that of the GPR. The MAPEs for the MI-based SVR and GPR were 4.872% and 4.785%, the RMSEs were 1196.775 MW and 1141.372 MW, and the MAEs were 773.447 MW and 755.325 MW, respectively. Based on these observations, the forecast errors of the SVR with any feature selection method was below 5% (marked in bold) except with RF. In addition, the MAPEs of GPR with CMI and RF were below 5% as well.
By comparison of the results, the RreliefF + SVR method showed the best performance with the least MAPE.

6.2. Load Forecasting for Singapore

To further verify the applicability of the proposed approach, the load data from Singapore was used to perform the load forecasting experiments.

6.2.1. Feature Selection for Hour Loads

First, using the same method used in Section 6.1.1, the score of the feature corresponding to the predictive target at different hours was computed by different feature selection methods. Then, the optimal feature subset was obtained based on the MAPE of different subsets forecast by a predictor.
Table 9 shows the MAPE and the dimension of the optimal feature subset corresponding to using MI as the feature selection method and RF as the prediction model. The Table shows that 1 to 7 am, the dimension of optimal feature subset is less than at 8 to 19 pm, as the same as the forecasting error. Similar to the analysis result of 4, this is because people are less active at night and there are fewer factors affecting the load than during the day.
The MAPE and the dimension of optimal feature subset corresponding to 1:00 were carried out by different feature selection methods and forecasting methods shown in Table 10. The MAPEs are in 1.0% to 1.6% which means the performance of forecasters were similar after feature selection. While by analysis the feature dimension, we could find there is huge difference between the number of the feature of the optimal feature subset and that selected by different feature selection methods, which is caused by the different evaluation criteria.
As is shown in Appendix A Table A3 to Table A4, considering both the MAPEs and the dimensions, the optimal feature subsets were used for the load forecasting of the Singapore data. Similar to the conclusion summarized in Table 4, the different optimal feature subsets employed various feature selection methods and forecasters.

6.2.2. Forecasting Results of Method Combinations with Optimal Feature Subsets for Singapore Load Data

To test the performance of diverse combined methods with the optimal feature subset, the data of special weeks were used for the experiment.
Two weeks were chosen randomly from the summer and winter of 2015 for testing as is shown in Figure 7 and Figure 8. The summer week included the days from 21 to 27 June and the winter week included days from 8 to 14 November. The results are shown in Figure 7 and Table 11. It was found that the GPR, RF, and SVR methods showed a better ability to forecast the summer loads. The MAPEs of the combinations of MI + GPR, CMI + GPR, RF + GPR, RreliefF + GPR, CMI + SVR, and RreliefF + SVR were less than 1.5%. The outstanding combined method was RreliefF + GPR whose MAPE was 1.402%, MAE was 74.400 MW, and RMSE was 93.092 MW. By observing Figure 8 and Table 12, the RreliefF + GPR method showed the best performance with an MAPE of 3.567%, an MAE of 200.711 MW, and an RMSE of 224.017 MW. The predictive results of GPR and SVR with different feature selection methods were better than those of the CART, BPNN, and RF methods.
To further verify the superiority of the proposed method based on feature subsets of different hours, the entire test data from Singapore was used for validation. Detailed information about the test data is shown in Table 2 in Section 5.2. Table 13 shows the average predictive error of the different combined methods. It indicates that, based on MI, the CMI, RF, RreliefF, and SVR methods achieved the minimum errors with MAPEs of 1.471%, 1.440%, 1.387%, and 1.373%, respectively. Of all the combined methods, the RreliefF + SVR method worked best with an MAPE of 1.373%, an MAE of 75.118 MW, and an RMSE of 147.585 MW.
By analyzing the load forecasting results for Singapore, the combination of RreliefF and SVR was the most accurate method.

6.3. Comparison and Discussion

6.3.1. Comparison of Forecasting Methods without Feature Selection for New England and Singapore

In this section, a comparison of the proposed method and the traditional method (which only builds a single predictor for forecasting without feature selection) based on the data of New England and Singapore was carried out to verify the necessity of forecasting by a modular predictor.
The histograms of the error and the training time duration of different forecasting methods using New England data are displayed in Figure 9. As shown in Figure 9a, the MAPE of the SVR that adopted the proposed method was almost half that of the SVR using the traditional method. The MAPE of other predictors employing the proposed method without the feature selection step decreased in different levels compared with the predictors employing the traditional method. By analyzing the MAE in Figure 9b and the RMSE in Figure 9c, a similar conclusion can be obtained. In addition, it is noted that the model training time of the proposed method decreased because of the smaller modeling training set. Therefore, the decreased error and training time reflect the advantages of the proposed method and confirms the necessity of employing a modular predictor.
The values of MAPE, MAE, and RMSE and the training time of each forecaster with different approaches based on the data of New England and Singapore are shown in Table 14. The results for New England indicate that the MAPE values of CART, RF, SVR, BPNN, and GPR with the proposed method were reduced by 0.182%, 2.253%, 4.294%, 1.953%, and 3.775% compared with the CART, RF, SVR, BPNN, and GPR with the traditional approach, respectively. Similarly, the results for Singapore also verified the superior performance of the proposed method.

6.3.2. Comparison of Forecasting Approaches with Feature Selection for New England and Singapore

A comparison between the proposed method and traditional method with feature selection was performed on the New England and Singapore datasets. The results of the proposed method with feature selection are shown in Table 8 (New England) and Table 13 (Singapore), and the results of the traditional method with feature selection are shown in Table 15. The results indicate that the error was reduced in different levels by adopting the proposed method. The largest reduction in MAPE resulted from the CMI + SVR and CART + BPNN methods with MAPEs of 2.799% and 3.072%, respectively. The minimum error was achieved by the RreliefF + SVR combination with MAPEs of 4.746% (New England) and 1.373% (Singapore). In conclusion, the forecasted results obtained by the proposed method were better than those of the traditional method regardless of the predictor used. The most accurate combined method was RreliefF + SVR.

7. Conclusions

Accurate day-ahead load forecasting enhances the stability of grid operations and improves the social benefits of power systems. To improve the accuracy of day-ahead load forecasting, a novel modular parallel forecasting model with feature selection was proposed. Load data from New England and Singapore were used to test the proposed method. The experimental results show the advantages of the proposed method as follows:
(1) A modular predictor consisting of 24 independent predictors can efficiently capture load characteristics with respect to different hours and thereby avoid the inaccurate analysis of a single predictor.
(2) The feature selection adopted for the load corresponding to different hours analyzes the relevance between the feature and a special load. Each optimal feature subset of different dimension benefits the building of a more-accurate predictor.
(3) To serve the demand of dispatch departments of different regions, the interval time p = 24 was chosen for structuring a general candidate feature set that met the requirements of the power system.
Future work will concentrate on predictor parameter optimization and improve the efficiency of forecasting in the modeling process and applying the proposed method to probabilistic load forecasting.

Author Contributions

L.L. put forward to the main idea and design the whole venation of this paper. L.X. and Z.H. did the experiments and prepared the manuscript. N.H. guided the experiments and paper writing. All authors have read and approved the final manuscript.

Acknowledgments

This work is supported by the National Nature Science Foundation of China (No. 51307020), the Science and Technology Development Project of Jilin Province (No. 20160411003XH), the Science and Technology Project of Jilin Province Education Department (No. JJKH20170219KJ), Major science and technology projects of Jilin Institute of Chemical Technology (No. 2018021), and Science and Technology Innovation Development Plan Project of Jilin City (No. 201750239).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Optimal feature subset construction of different hours from 1:00 to 12:00 with different methods for New England.
Table A1. Optimal feature subset construction of different hours from 1:00 to 12:00 with different methods for New England.
Time Point1:002:003:004:005:006:007:008:009:0010:0011:0012:00
ErrorMAPEFDMAPEFDMAPEFDMAPEFDMAPEFDMAPEFDMAPEFDMAPEFDMAPEFDMAPEFDMAPEFDMAPEFD
MICART3.74173.76924.071124.083844.472405.14064.9491644.7481644.6271644.765154.978215.529130
RF3.294343.419223.63293.783304.00894.828185.456615.314594.526644.171454.147424.41467
SVR3.064103.167283.189273.314233.553593.852384.353574.327473.682613.510583.598723.78923
ANN3.22683.32973.422263.897273.521304.215164.889404.345293.891233.848313.911114.34232
GPR3.0871193.2261153.35993.381993.629964.087364.476544.4321023.805653.645493.781363.85231
CMICART3.72923.76924.05874.192164.29685.24264.926994.523275.076154.523275.076155.413106
RF3.447203.447233.590123.717133.84864.505574.750404.5311304.136493.9491594.0091594.281124
SVR3.043133.126123.238123.34143.375483.722914.008603.972733.469883.351683.448933.66788
ANN3.062473.123423.134283.329533.365233.821644.178834.167353.590783.341753.418573.57663
GPR3.0521343.189233.288183.366163.593214.017184.1281503.9111583.5171683.352913.4551063.61288
CARTCART3.72934.05064.07144.13454.55865.596134.95894.75144.63474.725104.524195.51210
RF3.422113.51153.58963.615123.96374.511324.512204.367114.062223.989184.151214.54611
SVR3.068113.167113.548113.433123.798143.846333.804203.870183.524223.629193.633204.26018
ANN3.27093.301113.67053.483123.836134.012153.974194.081163.921173.775173.806184.39711
GPR3.24583.280113.526113.45883.85883.911333.753203.872183.707223.659193.732164.36011
RFCART3.74173.76924.05994.128444.55295.08474.807524.65654.33764.464104.14734.1556
RF3.533513.679273.790283.777113.991254.522304.624124.41693.973263.922153.801284.2277
SVR3.140413.512143.312423.469113.5518263.6554193.9069273.8594463.3732313.3966443.376433.732951
ANN3.069183.255203.097213.469173.486163.816274.262114.278133.682113.542193.424313.92314
GPR3.099813.239803.338643.4471503.632163.679194.208153.964563.522863.359633.484433.78737
RreliefFCART3.741103.76924.05984.156424.475204.91744.729384.646154.44374.030154.41744.44820
RF3.310263.390203.466173.560193.528143.764144.534304.294233.643103.514173.680224.00618
SVR3.043183.107193.233193.351303.434163.928143.716343.648213.320343.205343.407663.59453
ANN3.26993.306143.338133.368173.445163.555154.193353.760233.399193.329203.427253.82024
GPR3.0191343.1561523.329323.3461173.460163.578293.811343.715243.414223.333283.358813.80729
Table A2. Optimal feature subset construction of different hours from 13:00 to 24:00 with different methods for New England.
Table A2. Optimal feature subset construction of different hours from 13:00 to 24:00 with different methods for New England.
Time Point13:0014:0015:0016:0017:0018:0019:0020:0021:0022:0023:0024:00
ErrorMAPEFDMAPEFDMAPEFDMAPEFDMAPEFDMAPEFDMAPEFDMAPEEFDMAPEEFDMAPEEFDMAPEFDMAPEEFD
MICART4.6541645.1431645.7641645.4061646.170416.31366.066125.38895.166145.105125.19865.0868
RF4.663414.926335.190385.351465.547315.358985.1171364.506234.376284.77994.794414.84772
SVR3.927234.263354.456494.518824.645684.516694.418694.165213.867243.901733.8601544.039115
ANN4.042194.362144.626354.799294.746375.262464.849334.371153.919274.599424.261274.20335
GPR3.974244.177364.524294.689284.616324.823214.770324.272244.010264.157244.291224.44618
CMICART5.459145.045535.519245.406915.985186.38255.97355.299195.134205.178195.16484.99414
RF4.4451464.6081514.8431505.0641625.3261645.3521575.1361264.4961364.3941264.1761454.7311534.804133
SVR3.9231074.102974.339994.539864.5011054.540704.398873.9441133.717903.763943.895933.972111
ANN3.846643.827534.304544.452494.698864.500724.416944.2731563.780323.943763.971783.902117
GPR3.9161054.080404.474534.686944.4691724.4551684.709764.508164.116394.075494.3041034.432118
CARTCART4.655115.14185.768125.41296.320116.95586.39395.978135.324115.74295.21465.1827
RF4.587194.846155.113285.409225.390255.720225.444174.634154.665245.001294.70074.8067
SVR4.105154.371214.393284.882224.804245.515224.804264.236304.099244.256214.228184.16820
ANN4.205154.713184.666184.931205.248165.205224.986264.302264.248244.441184.152173.98822
GPR4.174134.507214.573285.109225.054235.124214.864284.424324.304214.545184.405174.32121
RFCART4.448125.022235.47665.35075.67856.12956.040225.273114.820265.175984.98055.08045
RF4.44574.617554.8451095.045525.226735.296705.101854.484684.450764.780754.711644.81050
SVR3.9711003.983354.122824.635944.556324.589404.7251014.089377.848253.847973.936903.996141
ANN4.101174.638164.585324.882135.205175.275115.227114.413173.890214.411104.328234.33619
GPR4.069484.091744.515424.662414.912324.5771594.886564.491434.026704.182814.367794.49718
RreliefFCART4.574204.949215.529295.4051085.817226.238235.860175.321244.796374.721235.00594.98620
RF4.365174.716204.908745.0651365.293155.166165.029814.4031094.3681404.613144.612204.61916
SVR3.693653.876664.104524.426494.496674.528414.558683.983523.809443.8031094.008434.05638
ANN4.089194.246264.644204.902395.000255.216165.161314.633304.379234.218144.566244.29121
GPR3.980404.161344.411334.632364.748454.4721604.3911704.301453.976434.284344.0011724.32416
Remark. The FD in Table 4 means feature dimension.
Table A3. Optimal feature subset construction of different hours from 1:00 to 12:00 with different methods for Singapore.
Table A3. Optimal feature subset construction of different hours from 1:00 to 12:00 with different methods for Singapore.
Time Point1:002:003:004:005:006:007:008:009:0010:0011:0012:00
ErrorMAPEFDMAPEFDMAPEFDMAPEFDMAPEFDMAPEFDMAPEFDMAPEFDMAPEFDMAPEFDMAPEFDMAPEFD
MICART1.595591.479571.40241.482101.53561.6241082.041292.727622.515482.526432.615592.45158
RF1.349721.138641.112611.137661.201791.453751.836572.229552.389552.359522.379592.33258
SVR1.225741.057611.056581.025571.114591.377901.401571.565431.749441.723551.796581.73857
ANN1.349471.179561.133101.276591.262761.453331.558301.926482.323582.319532.321482.42421
GPR1.170750.9551090.904920.963861.0251101.2811011.452941.8591102.0211132.039411.952981.87699
CMICART1.528111.47041.38641.39691.47541.632621.998722.7521082.7091352.5341242.5311142.485125
RF1.266311.093151.305331.052301.133501.416441.847232.191492.333422.345382.353432.23743
SVR1.209431.027280.950331.082351.123551.316651.387391.508341.651521.732331.692441.63144
ANN1.239171.062211.034281.072311.149291.312271.542231.979312.202232.107282.037242.12947
GPR1.1481220.9301380.8821630.9001671.037731.0901221.470271.879362.054432.012481.9251351.89250
CARTCART1.559141.528141.40331.482121.60821.6751432.323502.71872.950182.875192.5931022.478102
RF1.303261.113121.100211.105321.214251.431381.917442.289192.417452.458452.544562.47719
SVR1.103560.995361.031171.001731.038221.138321.575831.653251.803731.863461.916531.84553
ANN1.210161.061211.043151.037241.122321.252311.876221.981272.082232.158192.254282.27413
GPR1.169601.015740.9021200.9181151.066251.216331.4151731.8131721.9591721.9031721.8931721.851173
RFCART1.594721.583191.42521.525111.57791.672142.14782.45642.754102.710212.615422.48829
RF1.37151.089241.081221.105311.190351.381211.608102.04292.336102.239182.334412.23819
SVR1.186580.993130.904300.972381.035271.080391.370201.483271.617301.588291.682301.64923
ANN1.242171.016180.926281.014441.033241.149231.461111.689141.945251.930311.95391.95215
GPR1.163380.949760.897760.897600.9461051.115391.427271.835301.982311.966261.952351.88231
RreliefFCART1.53071.50691.283101.39581.51391.574131.754242.579382.579382.464402.412432.29542
RF1.300101.083181.042381.070311.178121.173141.448182.109192.248572.216592.204642.1919
SVR1.197211.019131.003141.047141.098591.080261.343381.464241.620421.671431.679421.67834
ANN1.242171.016180.926281.014441.033241.149231.645111.689141.945251.930311.95391.95215
GPR1.159950.950940.910950.925960.989881.128161.442221.780181.886341.901371.876411.77842
Table A4. Optimal feature subset construction of different hours from 13:00 to 24:00 with different methods for Singapore.
Table A4. Optimal feature subset construction of different hours from 13:00 to 24:00 with different methods for Singapore.
Time Point13:0014:0015:0016:0017:0018:0019:0020:0021:0022:0023:0024:00
ErrorMAPEFDMAPEFDMAPEFDMAPEFDMAPEFDMAPEFDMAPEFDMAPEFDMAPEFDMAPEFDMAPEFDMAPEFD
MICART2.501752.522372.700432.637452.619582.398512.113871.884491.790641.588901.614661.82013
RF2.353492.376422.387482.486442.534572.258622.049491.793431.632641.526591.485451.52955
SVR1.795491.850421.878431.898432.010541.883541.6291111.469391.317941.372401.337411.30175
ANN2.280252.362362.466232.334322.259442.324291.936451.797521.682651.482411.450381.39033
GPR1.912952.036402.032432.098432.0951021.8211701.7521011.554441.357941.304911.2831021.253119
CMICART2.3681112.7491262.5171272.6651132.4811252.4231101.9831151.8841391.779661.5871211.58071.6675
RF2.263372.307362.309362.349282.342352.160311.933401.672411.593301.466291.45681.45153
SVR1.702491.768341.792451.914421.937411.806331.646321.442271.359331.301411.324541.35237
ANN2.232272.125302.163432.300362.381232.218391.761311.851211.488151.405341.358221.37725
GPR1.913481.978451.994442.055402.0541021.8841281.6671711.4481721.2961721.2441711.2391251.28292
CARTCART2.4861032.55742.63042.73442.6261512.3981042.132151.940201.911861.591581.6381561.7347
RF2.414222.463182.493462.62272.65382.352171.909201.903271.760241.448251.499231.50345
SVR1.871521.935541.885542.115402.188391.843541.601871.606871.511691.322291.284471.22785
ANN2.218192.213182.202192.387192.404312.179191.832321.773191.754121.438331.453221.35117
GPR1.8711721.9501731.9481681.9811712.0111691.8241681.6631721.4421681.2851691.2351681.2051661.203169
RFCART2.501672.759272.67372.651342.617382.416351.999131.794271.750171.583521.638411.6922
RF2.272362.323332.328352.364342.396392.098261.881251.628191.527101.427161.410151.47824
SVR1.670281.751231.857221.853391.944361.789161.577571.394221.299561.240151.195121.24438
ANN1.972112.256102.22482.245102.326131.949211.822331.55171.419121.323171.244191.39632
GPR1.876441.981302.125342.023542.108421.893911.767351.502321.377131.269161.244181.26547
RreliefFCART2.323412.532412.662422.533242.366582.220421.949661.76671.674151.605211.624121.64210
RF2.201842.29892.243172.262142.271351.998141.713181.468211.366171.308151.337171.40024
SVR1.714411.768201.799371.801151.815141.646231.564201.409111.258161.265241.252241.29921
ANN1.972112.256102.22482.244102.326131.949211.822331.55171.419121.322121.244191.39632
GPR1.856291.882411.861311.937411.936301.789331.645951.450351.316561.296411.279721.208170

References

  1. He, Y.; Xu, Q.; Wan, J.; Yang, S. Short-term power load probability density forecasting based on quantile regression neural network and triangle kernel function. Energy 2016, 114, 498–512. [Google Scholar] [CrossRef]
  2. Nikmehr, N.; Najafi-Ravadanegh, S. Optimal operation of distributed generations in micro-grids under uncertainties in load and renewable power generation using heuristic algorithm. IET Renew. Power Gener. 2015, 9, 982–990. [Google Scholar] [CrossRef]
  3. Duan, Z.Y.; Gutierrez, B.; Wang, L. Forecasting Plug-In Electric Vehicle Sales and the Diurnal Recharging Load Curve. IEEE Trans. Smart Grid 2014, 5, 527–535. [Google Scholar] [CrossRef]
  4. Ferlito, S.; Adinolfi, G.; Graditi, G. Comparative analysis of data-driven methods online and offline trained to the forecasting of grid-connected photovoltaic plant production. Appl. Energy 2017, 205, 116–129. [Google Scholar] [CrossRef]
  5. Ferruzzi, G.; Cervone, G.; Delle Monache, L.; Graditi, G.; Jacobone, F. Optimal bidding in a Day-Ahead energy market for Micro Grid under uncertainty in renewable energy production. Energy 2016, 106, 194–202. [Google Scholar] [CrossRef]
  6. Feng, Y.H.; Ryan, S.M. Day-ahead hourly electricity load modeling by functional regression. Appl. Energy 2016, 170, 455–465. [Google Scholar] [CrossRef] [Green Version]
  7. Bindiu, R.; Chindris, M.; Pop, G.V. Day-Ahead Load Forecasting Using Exponential Smoothing. Sci. Bull. Petru Maior Univ. Tîrgu Mureș 2009, 6, 89–93. [Google Scholar]
  8. Al-Hamadi, H.M.; Soliman, S.A. Fuzzy short-term electric load forecasting using Kalman filter. IEE Proc.-Gener. Transm. Distrib. 2012, 153, 217–227. [Google Scholar] [CrossRef]
  9. Lee, C.M.; Ko, C.N. Short-term load forecasting using lifting scheme and ARIMA models. Expert Syst. Appl. 2011, 38, 5902–5911. [Google Scholar] [CrossRef]
  10. Luy, M.; Ates, V.; Barisci, N.; Polat, H.; Cam, E. Short-Term Fuzzy Load Forecasting Model Using Genetic–Fuzzy and Ant Colony–Fuzzy Knowledge Base Optimization. Appl. Sci. 2018, 8, 864. [Google Scholar] [CrossRef]
  11. Xiao, L.Y.; Shao, W.; Liang, L.L.; Wang, C. A combined model based on multiple seasonal patterns and modified firefly algorithm for electrical load forecasting. Appl. Energy 2016, 167, 135–153. [Google Scholar] [CrossRef]
  12. Khotanzad, A.; Zhou, E.; Elragal, H. A neuro-fuzzy approach to short-term load forecasting in a price-sensitive environment. IEEE Trans. Power Syst. 2002, 17, 1273–1282. [Google Scholar] [CrossRef]
  13. Felice, M.D.; Yao, X. Short-Term Load Forecasting with Neural Network Ensembles: A Comparative Study Application Notes. IEEE Comput. Intell. Mag. 2012, 6, 47–56. [Google Scholar] [CrossRef]
  14. Ahmad, A.; Javaid, N.; Alrajeh, N.; Khan, Z.A.; Qasim, U.; Khan, A. A Modified Feature Selection and Artificial Neural Network-Based Day-Ahead Load Forecasting Model for a Smart Grid. Appl. Sci. 2015, 5, 1756–1772. [Google Scholar] [CrossRef] [Green Version]
  15. Che, J.X.; Wang, J.Z.; Tang, Y.J. Optimal training subset in a support vector regression electric load forecasting model. Appl. Soft Comput. 2012, 12, 1523–1531. [Google Scholar] [CrossRef]
  16. Dudek, G. Short-Term Load Forecasting Using Random Forests. Intell. Syst. 2015, 323, 821–828. [Google Scholar]
  17. Lloyd, R.J. GEFCom2012 hierarchical load forecasting: Gradient boosting machines and Gaussian processes. Int. J. Forecast. 2014, 30, 369–374. [Google Scholar] [CrossRef] [Green Version]
  18. Che, J.X.; Wang, J.Z. Short-term load forecasting using a kernel-based support vector regression combination model. Appl. Energy 2014, 132, 602–609. [Google Scholar] [CrossRef]
  19. Božić, M.; Stojanović, M.; Stajić, Z.; Stajić, N. Mutual Information-Based Inputs Selection for Electric Load Time Series Forecasting. Entropy 2013, 15, 926–942. [Google Scholar] [CrossRef] [Green Version]
  20. Rong, G.; Liu, X. Support vector machine with PSO algorithm in short-term load forecasting. In Proceedings of the 2008 Chinese Control and Decision Conference, Yantai, China, 2–4 July 2008; pp. 1140–1142. [Google Scholar]
  21. Ma, L.H.; Zhou, S.; Lin, M. Support Vector Machine Optimized with Genetic Algorithm for Short-Term Load Forecasting. In Proceedings of the International Symposium on Knowledge Acquisition and Modeling IEEE, Wuhan, China, 21–22 December 2008; pp. 654–657. [Google Scholar]
  22. Zhang, Y.J.; Peng, X.Y.; Peng, Y.; Pang, J.Y.; Liu, D.T. Weighted bagging gaussion process regression to predict remaining useful life of electro-mechanical actuator. In Proceedings of the Prognostics and System Health Management Conference, Chengdu, China, 19–21 October 2016; pp. 1–6. [Google Scholar]
  23. Lahouar, A.; Slama, J.B.H. Day-ahead load forecast using random forest and expert input selection. Energy Convers. Manag. 2015, 103, 1040–1051. [Google Scholar] [CrossRef]
  24. Ghofrani, M.; West, K.; Ghayekhloo, M. Hybrid time series-bayesian neural network short-term load forecasting with a new input selection method. In Proceedings of the 2015 IEEE Power & Energy Society General Meeting, Denver, CO, USA, 26–30 July 2015; pp. 1–5. [Google Scholar]
  25. Chandrashekar, G.; Sahin, F. A Survey on Feature Selection Methods; Pergamon Press, Inc.: New York, NY, USA, 2014. [Google Scholar]
  26. Kohavi, R.; John, G.H. Wrappers for feature subset selection. Artif. Intell. 1996, 97, 273–324. [Google Scholar] [CrossRef]
  27. Hu, Z.Y.; Bao, Y.K.; Chiong, R.; Xiong, T. Mid-term interval load forecasting using multi-output support vector regression with a memetic algorithm for feature selection. Energy 2015, 84, 419–431. [Google Scholar] [CrossRef]
  28. Goldberg, D.E. Genetic Algorithms in Search, Optimization and Machine Learning; Addison-Wesley Longman Publishing Co., Inc.: Boston, MA, USA, 1990; pp. 2104–2116. [Google Scholar]
  29. Hyojoo, S.; Kim, C. Forecasting Short-term Electricity Demand in Residential Sector Based on Support Vector Regression and Fuzzy-rough Feature Selection with Particle Swarm Optimization. Procedia Eng. 2015, 118, 1162–1168. [Google Scholar]
  30. Isabelle, G.; Elisseeff, A. An introduction to variable and feature selection. J. Mach. Learn. Res. 2003, 3, 1157–1182. [Google Scholar]
  31. Koprinska, I.; Rana, M.; Agelidis, V.G. Correlation and instance based feature selection for electricity load forecasting. Knowl.-Based Syst. 2015, 82, 29–40. [Google Scholar] [CrossRef]
  32. Hu, Z.Y.; Bao, Y.K.; Xiong, T.; Chiong, R. Hybrid filter–wrapper feature selection for short-term load forecasting. Eng. Appl. Artif. Intell. 2015, 40, 17–27. [Google Scholar] [CrossRef]
  33. Abedinia, O.; Amjady, N.; Zareipour, H. A New Feature Selection Technique for Load and Price Forecast of Electrical Power Systems. IEEE Trans. Power Syst. 2016, 32, 62–74. [Google Scholar] [CrossRef]
  34. Raza, M.Q.; Khosravi, A. A review on artificial intelligence based load demand forecasting techniques for smart grid and buildings. Renew. Sustain. Energy Rev. 2015, 50, 1352–1372. [Google Scholar] [CrossRef]
  35. Li, S.; Goel, L.; Wang, P. An ensemble approach for short-term load forecasting by extreme learning machine. Appl. Energy 2016, 170, 22–29. [Google Scholar] [CrossRef]
  36. Kononenko, I. Theoretical and Empirical Analysis of ReliefF and RReliefF. Mach. Learn. J. 2003, 53, 23–69. [Google Scholar] [Green Version]
  37. Breiman, L.I.; Friedman, J.H.; Olshen, R.A.; Stone, C.J. Classification and Regression Trees (CART). Biometrics 1984, 40, 17–23. [Google Scholar]
  38. Breiman, L. Random Forest. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  39. He, Y.Y.; Liu, R.; Li, H.Y.; Wang, S.; Lu, X.F. Short-term power load probability density forecasting method using kernel-based support vector quantile regression and Copula theory. Appl. Energy 2107, 185, 254–266. [Google Scholar] [CrossRef]
  40. Yu, F.; Xu, X.Z. A short-term load forecasting model of natural gas based on optimized genetic algorithm and improved BP neural network. Appl. Energy 2014, 134, 102–113. [Google Scholar] [CrossRef]
  41. Seeger, M. Gaussian processes for machine learning. J. Neural Syst. 2011, 14, 69–106. [Google Scholar] [CrossRef] [PubMed]
  42. ISO New England Load Data. Available online: https://www.iso-ne.com/isoexpress/web/reports/pricing/-/tree/zone-info (accessed on 11 November 2014).
  43. Singapore Load Data. Available online: https://www.emcsg.com/PriceInformation#download (accessed on 19 December 2016).
  44. Sheela, K.G.; Deepa, S.N. Review on Methods to Fix Number of Hidden Neurons in Neural Networks. Math. Prob. Eng. 2013, 6, 389–405. [Google Scholar] [CrossRef]
Figure 1. The power load of New England.
Figure 1. The power load of New England.
Energies 11 01899 g001
Figure 2. Overview of the proposed method.
Figure 2. Overview of the proposed method.
Energies 11 01899 g002
Figure 3. Normalized feature score of features evaluated by kinds of feature selection methods.
Figure 3. Normalized feature score of features evaluated by kinds of feature selection methods.
Energies 11 01899 g003
Figure 4. Mean absolute percentage error (MAPE) curves of combinations of feature selection methods and forecasting methods for selecting feature subset.
Figure 4. Mean absolute percentage error (MAPE) curves of combinations of feature selection methods and forecasting methods for selecting feature subset.
Energies 11 01899 g004
Figure 5. Prediction from 28 July to 3 August 2013.
Figure 5. Prediction from 28 July to 3 August 2013.
Energies 11 01899 g005
Figure 6. Prediction from 22 to 28 December 2013.
Figure 6. Prediction from 22 to 28 December 2013.
Energies 11 01899 g006
Figure 7. Prediction from 21 to 27 June 2015.
Figure 7. Prediction from 21 to 27 June 2015.
Energies 11 01899 g007
Figure 8. Prediction from 8 to 14 November 2015.
Figure 8. Prediction from 8 to 14 November 2015.
Energies 11 01899 g008
Figure 9. Comparison of error and time of training a model with traditional and proposed approaches.
Figure 9. Comparison of error and time of training a model with traditional and proposed approaches.
Energies 11 01899 g009
Table 1. The feature information.
Table 1. The feature information.
Feature TypeFeature NameFeature Number
Endogenous predictor F L ( t i ) , i = 24, 25, …, 168145
F L ( max , d k ) , F L ( min , d k ) , F L ( mean , d k ) , k = 2, 3, 4, 5, 6, 718
Exogenous predictor F D W , W = 1, 2, 3, 4, 5, 6, 77
F W 2
F Hour 1
Table 2. Experimental data description.
Table 2. Experimental data description.
Data SetDetail Information of Experimental Data (New England)Detail Information of Experimental Data (Singapore)
20112012201320142015
Training setJan., Feb., Mar., Apr., May, Jun., Jul., Aug., Sept., Oct., Nov., Dec.Jan., Feb., Apr., Jun., Jul., Aug., Oct., Dec.-Jan., Feb., Mar., Apr., May, Jun., Jul., Aug., Sept., Oct., Nov., Dec.Jan., Apr., Aug., Dec.
Validation set-Mar., May, Sept., Nov.--Feb., May, Jul., Oct.
Test set--Jan., Feb., Mar., Apr., May, Jun., Jul., Aug., Sept., Oct., Nov., Dec.-Mar., Jun., Sept., Nov.
Table 3. Top 10 features of ranked of feature by different feature selection corresponding to Figure 3.
Table 3. Top 10 features of ranked of feature by different feature selection corresponding to Figure 3.
MICMIRreliefFRFCART
Hour 5FL(t−24), FL(t−25), FL(t−26), FL(t−27), FL(t−28), FL(t−29), FL(min, d−2), FL(t−30), FL(mean, d−2), FL(t−44)FL(t−24), FL(t−25),
FL(t−29), FL(t−28),
FL(t−160), FL(t−26),
FL(t−161), FL(t−162),
FL(t−27), FL(max, d−2)
FL(t−24), FL(t−25),
FL(t−26), FL(t−27),
FL(t−28), F W 0 , F W 1 ,
FL(t−28), FL(max, d−2),
FL(t−31)
FL(t−24), FL(t−25),
FL(t−163), FL(t−162),
FL(t−26), FL(t−164),
FL(t−30), FL(t−29),
FL(t−160), FL(t−27)
FL(t−24), FL(t−25), FL(t−26), FL(t−27), FL(t−28), FL(t−30), FL(t−163), FL(t−160),
FL(t−161), FL(t−162)
Hour 6FL(t−160), FL(t−162),
FL(t−161), FL(t−24),
FL(t−164), FL(mean, d−7),
FL(t−163), FL(t−159),
FL(t−28), FL(t−29)
FL(t−161), FL(t−162),
FL(t−160), FL(t−163),
FL(t−159), FL(t−29),
FL(t−145), FL(t−158),
FL(t−141), FL(t−65)
F W 0 , F W 1 , F D 7 ,
FL(t−24), FL(t−25),
FL(t−26), F D 1 ,
FL(t−28), FL(t−27),
FL(t−29)
FL(t−24), FL(t−162),
FL(t−161), FL(t−160),
FL(t−30), FL(t−29),
FL(t−25), F W 0 ,
FL(t−163), FL(mean, d−7)
FL(mean, d−7), FL(t−159),
FL(t−147), FL(t−146),
FL(t−148), FL(max, d−7),
FL(t−24), FL(t−25),
FL(t−30), FL(t−26)
Hour 10FL(t−158), FL(t−159),
FL(t−157), FL(mean, d−7),
FL(t−160), FL(t−156),
FL(t−24), FL(t−154),
FL(t−147), FL(t−153)
FL(t−161), FL(t−160),
FL(t−162), F W 0 , F W 1 ,
FL(t−159), FL(t−158),
FL(t−157), FL(t−154),
FL(t−155), FL(t−159)
F W 0 , F W 1 , F D 7 , F D 6 , FL(t−26), FL(t−25),
FL(t−27), FL(t−24),
FL(t−28), F D 1
F W W 1 , F W 0 , FL(t−159), FL(t−25), FL(t−160),
FL(t−24), FL(t−161),
FL(t−26), FL(t−28),
FL(t−27)
FL(t−159), FL(t−158),
FL(t−160), FL(t−157), FL(mean, d−7), FL(t−156),
FL(t−25), FL(t−27),
FL(t−28), FL(t−26)
Hour 11FL(t−159), FL(t−157),
FL(t−158), FL(t−156), FL(mean, d−7), FL(t−153), FL(t−155), FL(t−152),
FL(t−160), FL(t−154)
FL(t−160), FL(t−162),
FL(t−161), FL(t−159),
FL(t−158), F W 0 , F W 1 ,
FL(t−154), FL(t−156),
FL(t−155)
F W 0 , F W 1 ,
F D 7 , FL(t−26),
FL(t−27), FL(t−25),
FL(t−33), FL(t−24),
FL(t−34), FL(t−28)
F W 1 , F W 0 , FL(t−26),
FL(t−27), FL(t−25),
FL(t−161), FL(t−157),
FL(t−160), FL(t−24),
FL(t−158)
FL(t−157), FL(t−156),
FL(t−155), FL(t−153),
FL(t−154), FL(t−158),
FL(t−26), FL(t−25),
FL(t−27), FL(t−28)
Table 4. Optimal feature subset construction of different hours with mutual information (MI) + random forest (RF) for New England.
Table 4. Optimal feature subset construction of different hours with mutual information (MI) + random forest (RF) for New England.
TimeMAPEFDTimeMAPEFD
13.29434134.66341
23.41922144.92633
33.6329155.19038
43.78330155.35146
54.0089175.54731
64.82818185.35898
75.45661195.117136
85.31459204.50623
94.52664214.37628
104.17145224.7799
114.14742234.79441
124.41467244.84772
Remark: FD means the feature dimension.
Table 5. Optimal feature subset construction of 1:00 with different methods for New England.
Table 5. Optimal feature subset construction of 1:00 with different methods for New England.
MethodCARTRFSVRANNGPR
MAPEFDMAPEFDMAPEFDMAPEFDMAPEFD
MI3.74173.294343.064103.22683.087119
CMI3.72923.447203.043133.062473.052134
CART3.72933.422113.068113.27093.2458
RF3.74173.533513.140413.069183.09981
RreliefF3.741103.310263.043183.26993.019134
Table 6. Comparison of different combined methods.
Table 6. Comparison of different combined methods.
MethodCARTRFGPRBPNNSVR
MIMAPE5.0274.3764.2235.7054.286
MAE849.194732.926709.848962.649720.361
RMSE1191.968871.897988.3781323.916921.862
CMIMAPE4.6724.4234.2994.4574.880
MAE784.550719.337699.910566.609809.988
RMSE1016.001931.743942.492715.5241027.936
CARTMAPE6.1794.9364.4494.9103.634
MAE1034.009833.712752.653823.088599.284
RMSE1282.5151077.501961.3041142.275753.655
RFMAPE4.9364.2314.3814.2914.262
MAE833.712711.268815.776711.438705.789
RMSE1077.501855.686915.139969.156916.701
RreliefFMAPE4.5773.7104.2394.2704.204
MAE786.561629.120717.094710.419700.174
RMSE1072.662781.7751045.609922.320910.103
Table 7. Comparison of different combined methods.
Table 7. Comparison of different combined methods.
MethodCARTRFGPRBPNNSVR
MIMAPE5.4205.7834.8625.8234.977
MAE809.153855.560706.073868.632734.877
RMSE1052.0171038.861875.3571059.056897.331
CMIMAPE5.4795.5154.8625.0725.262
MAE814.890821.482710.464733.701788.030
RMSE1029.141983.674867.158941.800956.224
CARTMAPE6.8765.1544.7545.2064.770
MAE1027.157763.678704.088776.566705.472
RMSE1307.7681031.547892.3561055.224911.921
RFMAPE5.1545.4214.8175.1904.540
MAE763.678795.999697.702757.221666.295
RMSE1031.547955.704858.767961.667849.553
RreliefFMAPE4.9854.8305.0264.6894.207
MAE741.379713.534749.809702.243628.159
RMSE1019.697893.1031034.086931.176810.417
Table 8. Error of load forecasting of different methods with proposed forecasting approach for the whole test set.
Table 8. Error of load forecasting of different methods with proposed forecasting approach for the whole test set.
Feature Selection MethodForecasterEvaluated Criterion
MAPE (%)RMSE (MW)MAE (MW)
MICART6.0211360.445934.560
RF5.5361260.281864.385
SVR4.8721196.775773.447
BPNN5.4911320.809865.842
GPR4.7851141.372755.325
CMICART6.0881371.643945.217
RF5.3641235.216841.376
SVR4.8701225.231776.654
BPNN5.0541179.931793.064
GPR4.7581135.260750.937
CARTCART6.4951493.3441013.322
RF5.3641228.542837.765
SVR4.7941158.022758.601
BPNN5.4141270.671847.104
GPR5.0181176.996790.088
RFCART5.8831322.730911.334
RF5.3851236.724843.334
SVR5.5341260.281834.385
BPNN5.2871248.014827.752
GPR4.8391244.614761.119
RreliefFCART5.8041898.1901305.192
RF5.2021220.145816.788
SVR4.7461229.229759.143
BPNN5.1751244.537812.642
GPR5.5431410.293883.576
Table 9. Optimal feature subset construction of different hours with MI + RF for Singapore.
Table 9. Optimal feature subset construction of different hours with MI + RF for Singapore.
TimeMAPEFDTimeMAPEFD
11.34972132.35349
21.13864142.37642
31.11261152.38748
41.13766152.48644
51.20179172.53457
61.45375182.25862
71.83657192.04949
82.22955201.79343
92.38955211.63264
102.35952221.52659
112.37959231.48545
122.33258241.52955
Table 10. Optimal feature subset construction of 1:00 with different methods for Singapore.
Table 10. Optimal feature subset construction of 1:00 with different methods for Singapore.
MethodCARTRFSVRANNGPR
MAPEFDMAPEFDMAPEFDMAPEFDMAPEFD
MI1.595591.349721.225741.349471.17075
CMI1.528111.266311.209431.239171.148122
CART1.559141.303261.103561.210161.16960
RF1.594721.37151.186581.242171.16338
RreliefF1.53071.300101.197211.242171.15995
Table 11. Comparison of different combined methods.
Table 11. Comparison of different combined methods.
MethodCARTRFGPRBPNNSVR
MIMAPE2.3212.1451.4392.7191.662
MAE128.596119.05879.453153.69391.493
RMSE162.801137.46299.346202.410110.360
CMIMAPE2.1171.8671.4193.1651.482
MAE115.395103.81078.407180.78681.177
RMSE150.781134.39099.425322.873102.596
CARTMAPE2.4202.1361.6451.9631.911
MAE132.823118.57191.358108.851106.369
RMSE175.615143.584139.408160.930152.349
RFMAPE2.2132.0001.4351.7021.404
MAE123.568112.36977.80394.08577.236
RMSE148.988146.75997.686117.29595.627
RreliefFMAPE2.7201.8621.4281.9171.402
MAE154.605103.58679.134105.90274.400
RMSE201.458128.291101.035127.63193.092
Table 12. Comparison of different combined methods.
Table 12. Comparison of different combined methods.
MethodCARTRFGPRBPNNSVR
MIMAPE3.8953.8543.8064.2733.637
MAE217.339217.647215.942243.454204.362
RMSE250.640240.934236.196283.816232.913
CMIMAPE3.5733.5183.8995.0233.585
MAE200.803197.387221.095288.472200.942
RMSE229.837217.891239.055390.638229.780
CARTMAPE3.8683.5874.1153.8973.599
MAE215.523200.915234.630219.650201.124
RMSE260.178225.501272.193254.684235.158
RFMAPE3.7993.7113.8513.8713.599
MAE212.788209.019218.327218.218201.083
RMSE245.087231.296236.664241.936230.831
RreliefFMAPE3.9813.8954.1043.9353.567
MAE222.013219.243233.919221.717200.711
RMSE262.683242.705254.076247.552224.017
Table 13. Error of load forecasting of different methods with proposed forecasting strategy for the whole test set.
Table 13. Error of load forecasting of different methods with proposed forecasting strategy for the whole test set.
Feature Selection MethodForecasterEvaluated Criterion
MAPE (%)RMSE (MW)MAE (MW)
MICART2.019172.293112.003
RF1.668157.94692.817
SVR1.474154.19180.67
BPNN2.551218.916145.116
GPR1.492147.72682.693
CMICART2.174189.964121.050
RF1.623156.45090.309
SVR1.440151.23078.764
ANN3.072332.424177.185
GPR1.538148.12785.497
CARTCART2.219201.990123.030
RF1.733164.60496.589
SVR1.748188.22596.562
BPNN1.954192.515109.282
GPR1.774183.26699.119
RFCART2.012172.188111.418
RF1.641160.65991.235
SVR1.387148.92675.885
BPNN1.663158.08892.355
GPR1.461145.83381.011
RreliefFCART2.075177.441116.199
RF1.608155.96289.551
SVR1.373147.58575.118
BPNN1.669157.98892.890
GPR1.446144.17080.283
Table 14. Comparison of the error of different forecasting approaches with original feature set.
Table 14. Comparison of the error of different forecasting approaches with original feature set.
MethodForecasterTest for New EnglandTest for Singapore
MAPE (%)MAE (MW)RMSE (MW)Time (s)MAPE (%)MAE (MW)RMSE (MW)Time (s)
The proposed methodCART5.348738.6411067.7230.1062.166116.742209.4130.275
RF4.867671.261941.66110.4451.930105.306199.97410.776
SVR4.228580.80849.8060.4311.914103.356196.1450.405
BPNN5.167705.324986.974962.4573.133174.104285.083844.257
GPR4.242581.889844.07662.1021.52382.573170.4781.569
The traditional methodCART5.530778.0831076.3167.9763.597196.391273.1122.601
RF7.120975.2721235.783486.2632.088114.732209.064402.743
SVR8.5221130.8701556.371123.3945.067267.048361.54791.623
BPNN7.120975.2721235.7836170.8354.864267.416408.3054686.007
GPR8.0171065.7001387.2521219.0565.072287.277405.1811054.359
Table 15. Error of load forecasting of different methods with traditional forecasting approach for the whole test set.
Table 15. Error of load forecasting of different methods with traditional forecasting approach for the whole test set.
Feature Selection MethodForecasterTest for New EnglandTest for Singapore
MAPE (%)MAE (MW)RMSE (MW)MAPE (%)MAE (MW)RMSE (MW)
MICART8.4521269.7111701.8083.247178.082239.891
RF5.911920.2011339.2271.855103.612168.744
SVR7.5871116.5291521.6914.246222.376314.547
BPNN5.854909.5531390.5742.103115.674176.764
GPR5.680881.3101296.1192.161118.833180.018
CMICART8.4201267.2131705.9263.320182.186241.884
RF5.645878.4791281.3611.838102.269164.790
SVR7.6691134.3081560.8534.206219.965313.680
BPNN7.6971173.1601929.6752.053113.328173.493
GPR6.5621029.7081558.3772.104115.482175.308
CARTCART8.4201267.2131705.9263.212175.834238.396
RF5.970921.9761318.9801.940108.871175.704
SVR7.6351127.9401506.5044.170217.423312.793
BPNN6.044922.4971404.1375.026278.908462.199
GPR5.904920.0841372.3752.860161.948250.843
RFCART8.0561212.1141653.0793.262179.431242.135
RF5.483858.9341306.1361.833102.516167.013
SVR7.3161081.4041482.8644.147216.703310.201
BPNN5.348831.4931196.4811.79099.181160.264
GPR5.774902.8721321.0571.951108.686169.148
RreliefFCART8.0561212.1141653.0793.188174.799237.592
RF5.506866.3771333.5772.003111.688176.002
SVR7.3501081.2591464.3664.319226.854319.170
BPNN5.789894.6861320.9011.958107.762168.592
GPR6.015967.1631682.2982.130117.884188.138

Share and Cite

MDPI and ACS Style

Lin, L.; Xue, L.; Hu, Z.; Huang, N. Modular Predictor for Day-Ahead Load Forecasting and Feature Selection for Different Hours. Energies 2018, 11, 1899. https://doi.org/10.3390/en11071899

AMA Style

Lin L, Xue L, Hu Z, Huang N. Modular Predictor for Day-Ahead Load Forecasting and Feature Selection for Different Hours. Energies. 2018; 11(7):1899. https://doi.org/10.3390/en11071899

Chicago/Turabian Style

Lin, Lin, Lin Xue, Zhiqiang Hu, and Nantian Huang. 2018. "Modular Predictor for Day-Ahead Load Forecasting and Feature Selection for Different Hours" Energies 11, no. 7: 1899. https://doi.org/10.3390/en11071899

APA Style

Lin, L., Xue, L., Hu, Z., & Huang, N. (2018). Modular Predictor for Day-Ahead Load Forecasting and Feature Selection for Different Hours. Energies, 11(7), 1899. https://doi.org/10.3390/en11071899

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop