Next Article in Journal
Gaussian Belief Propagation for Solving Network Utility Maximization with Delivery Contracts
Next Article in Special Issue
Kernel Mixture Correntropy Conjugate Gradient Algorithm for Time Series Prediction
Previous Article in Journal
Refined Multiscale Entropy Using Fuzzy Metrics: Validation and Application to Nociception Assessment
Previous Article in Special Issue
A Novel Active Learning Regression Framework for Balancing the Exploration-Exploitation Trade-Off
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Electricity Consumption Forecasting using Support Vector Regression with the Mixture Maximum Correntropy Criterion

1
School of Automation and Information Engineering, Xi’an University of Technology, Xi’an 710048, China
2
State Key Laboratory of Electrical Insulation and Power Equipment, Xi’an Jiaotong University, Xi’an 710048, China
3
School of Statistics, Xi’an University of Finance and Economics, Xi’an 710100, China
*
Author to whom correspondence should be addressed.
Entropy 2019, 21(7), 707; https://doi.org/10.3390/e21070707
Submission received: 28 May 2019 / Revised: 5 July 2019 / Accepted: 15 July 2019 / Published: 19 July 2019
(This article belongs to the Special Issue Information Theoretic Learning and Kernel Methods)

Abstract

:
The electricity consumption forecasting (ECF) technology plays a crucial role in the electricity market. The support vector regression (SVR) is a nonlinear prediction model that can be used for ECF. The electricity consumption (EC) data are usually nonlinear and non-Gaussian and present outliers. The traditional SVR with the mean-square error (MSE), however, is insensitive to outliers and cannot correctly represent the statistical information of errors in non-Gaussian situations. To address this problem, a novel robust forecasting method is developed in this work by using the mixture maximum correntropy criterion (MMCC). The MMCC, as a novel cost function of information theoretic, can be used to solve non-Gaussian signal processing; therefore, in the original SVR, the MSE is replaced by the MMCC to develop a novel robust SVR method (called MMCCSVR) for ECF. Besides, the factors influencing users’ EC are investigated by a data statistical analysis method. We find that the historical temperature and historical EC are the main factors affecting future EC, and thus these two factors are used as the input in the proposed model. Finally, real EC data from a shopping mall in Guangzhou, China, are utilized to test the proposed ECF method. The forecasting results show that the proposed ECF method can effectively improve the accuracy of ECF compared with the traditional SVR and other forecasting algorithms.

1. Introduction

1.1. Motivation

Electricity consumption forecasting (ECF) refers to the estimation or forecast of the future monthly electricity sales by collating and analyzing historical data. Whether for the bid evaluation system of the State Grid Corporation or for the open electricity market today, electricity sales forecasting is very important, especially for power sales companies, since it will directly affect the deviation assessment. The bigger the deviation, the higher the penalty. Accurate prediction of monthly electricity sales is of vital practical significance and practical value to the power sales company.
ECF is a new problem challenging the current power sales company. In essence it consists of load forecasting. However, compared with load forecasting, it has some new characteristics, such as fewer historical load data and stronger randomness.

1.2. Literature Review

Extensive research has been conducted on load forecasting. Ref. [1] established a long-term power load forecasting model by using a support vector machine (SVM) model based on the comprehensive consideration of economic factors, social factors, and energy market structure and optimizing a multi-factor medium and the parameters of SVM by using particle swarm optimization (PSO) and an improved cross-validation method. In Ref. [2], the improved fly optimization algorithm (FOA) algorithm was used to optimize the parameters of the wavelet least-squares support vector machine (using the wavelet kernel function instead of the Gauss kernel function in least-squares support vector machines (LSSVM)), and the combined algorithm was applied to medium- and long-term power load forecasting. In Ref. [3], the author combined the Fuzzy Time Series with Seasonal Autoregressive Fractionally Integrated Moving model to predict load changes, and the PSO algorithm s used to optimize the weights of the model leading to a good prediction result. In Ref. [4], the author proposed a hybrid optimization grey model called Rolling-ALO-GM (1,1) for ECM based on known information from annual EC by using the grey control theory. In Ref. [5], the author used the Drosophila optimization algorithm with decreasing step size (SFOA) to optimize the extended parameters of generalized regression neural network (GRNN). By combining it with the periodic characteristics of weather factors and power load, a SFOA–GRNN power load forecasting model was proposed. The experimental results showed that the improved model was much better than the original model.
The SVM is a supervised learning algorithm which can solve the problems of small sample, non-linearity, and high latitude and is more efficient in solving pattern recognition and regression problems. Therefore, it has been widely used in short-term load forecasting. In Ref. [6], in order to improve the accuracy and efficiency of calculation, the secondary sampling strategy of support vector regression (SVR) learning process was designed, and a method of modeling and forecasting short-term load point estimation and its confidence interval length using subsampled SVR ensemble was obtained. In Ref. [7], a new grid algorithm was proposed, which provides a new method for the determination of parametric regression surface. A sequential grid method based on SVR was established for short-term load forecasting. Compared with the standard SVR model, the accuracy of short-term load forecasting was greatly improved. In order to deal with the cyclic nature of electric loads, Ref. [8] proposed the SVR with chaotic cuckoo search (SVRCCS) model, based on using a tent chaotic mapping function to enrich the cuckoo search space and diversify the population to avoid trapping in local optima, and combined the SVRCCS model with a seasonal mechanism, thus obtaining a seasonal SVRCCS model. The numerical results showed that the proposed novel model outperforms other alternative models. Ref. [9] used the proposed chaotic efficient bat algorithm, niche search, and evolution mechanisms to optimize the parameters of the hybrid kernel-based SVR model and used motion data for a real floating platform to evaluate the reliability and effectiveness of the proposed model. Ref. [10] proposed a novel vector field-based SVR method. Through multi-distortions in the sample data space or high-dimensional feature space mapped by a vector field, the optimal feature space was found, in which the high non-linearity between inputs and outputs was approximated by linearity. The results indicated that the proposed method achieves better performance than commonly used methods with regard to the accuracy, robustness, and generalization ability.
However, the loss function of the traditional SVR is the mean-square error (MSE) criterion. The MSE criterion only considers the second-order distance of the prediction error distribution. It is very effective for data with Gaussian characteristics, which results in high accuracy only when predicting a stationary sequence. Compared with the power consumption of large users and stable users, the power consumption of small users is constrained by many external factors, such as orders, operation plans, equipment, environmental factors, meteorological changes, emergencies, and so on. It produces non-linear characteristics, such as randomness and uncertainty, and prediction errors are often non-gaussian. Therefore, the application of MSE-based SVR in power consumption forecasting will be less accurate compared with that for large users and stable users. It produces non-linear characteristics, such as randomness and uncertainty, and prediction errors are often non-gaussian. Therefore, the application of SVR based on MSE in ECF has some shortcomings. The Neural Network Laboratory of the University of Florida, USA, proposed a new definition of the correlation function of stochastic processes, namely, correntropy [11]. When it is taken as the cost function in machine learning and signal processing, a Maximum Correntropy Criterion (MCC) can be defined for non-gauss, non-linear applications. The MCC contains higher moments of probability density function and is suitable for any noise environment. So far, many robust learning algorithms have been developed using MCC [12,13,14,15,16,17,18]. It has been proved that they can perform a robust analysis [19,20,21,22] and effectively deal with non-Gaussian situations and outliers [23,24,25]. In order to further improve the learning performance, Ref. [26] proposed the concept of the maximum mixture correntropy criterion (MMCC), which uses the mixture of two Gauss functions as the core function. The learning algorithm using MMCC has a better performance than the traditional MCC-based learning algorithm and several other advanced learning algorithms. On the basis of the above analysis, MMCC was introduced into SVR for the first time, and an MMCCSVR forecasting model for EC is proposed in this paper.

1.3. Our Contribution

The main contributions of this study can be summarized as follows:
(1) In order to solve the problem due to the fact that traditional SVR based on MSE loss function only has high efficiency in data processing with Gaussian characteristics, which leads to its insensitivity to burst and low accuracy in predicting a non-stationary sequence, MMCC is introduced into SVR as loss function, and an MMCCSVR model is proposed.
(2) The MMCCSVR model is applied to ECF for the first time. The simulation results show that the forecasting accuracy of this method is higher than that of traditional SVR and other algorithms.

1.4. Organization of the Paper

This paper is structured as follows: Section 2 gives an overview of SVR and MMCC. Section 3 proposes the MMCCSVR model for ECF in detail. Section 4 introduces the main steps of forecasting. In Section 5, the model validation is introduced and next, experimental results and comparisons with traditional SVR and other algorithms are shown and discussed. Finally, a brief summary and a description of future work are given in Section 6.

2. Methodology

2.1. SVR

The SVR model has its own unique advantages in dealing with such problems as small amount of data, obvious non-linear characteristics, and high-dimensional pattern recognition. It only needs a small number of support vectors to establish decision functions, which can effectively avoid dimension disasters. Moreover, the reduction or increase of non-support vector data has little impact on the algorithm and has better robustness. SVR uses the kernel function φ ( x ) in substitution of nonlinear mapping, the controlled parameters are few, and the algorithm is simple and easy to implement [27].
In an SVR model, suppose the training set sample is ( x i , y i ) | i = 1 , 2 , , n , and n is the number of samples. The regression problem can be transformed into an optimization problem, as follows.
min   C i = 1 n ξ i 2 + ω 2 s . t .   ω T φ ( x i ) + b y i = ξ i , i = 1 , 2 , , n
where ξ is the error variable, ω is the weight vector, ω H , b is the bias, and C is the penalty parameter; C is used to control the minimization of the estimation error and the function smoothness.
In order to solve the optimization problem, the Lagrange function is developed as
L ( ω , b , ξ , α ) = ( ω T ω ) + C i = 1 n ξ i 2 i = 1 n α i 2 [ ω T φ ( x i ) + b y i ξ i ]
where α = ( α 1 , α 2 , , α n ) is the Lagrange multiplier. Differentiating L with respect to the variables ω , b , ξ , and α , we obtain
L ω = 0 ω = I = 1 n α i φ ( x i ) ,
L b = 0 ω = I = 1 n α i = 0 ,
L ξ i = 0 α i = C ξ i ,
L α i = 0 ω T φ ( x i ) + b + ξ i y i = 0 ,
and, after solving the above functions, we can obtain the solution of the problem in the following form
f ( x ) = i = 1 n ω i K ( x , x i ) + b
where K ( ) is the kernel function. In this paper, we used Gaussian kernel K ( x , x i ) = exp [ x x i 2 γ 2 ] to construct nonlinear SVR since it is widely used in the literature and has shown good learning properties in a variety of applications.
In the following subsection, we introduce the MMCC method.

2.2. Mixture Maximum Correntropy Criterion

First, correntropy is defined as follows
V ( X , Y ) = E [ κ ( X , Y ) ]
where κ ( X , Y ) is the kernel function, and E [ ] is the mathematical expectation. Correntropy is a generalized correlation measure. Suppose { ( x i , y i ) } i = 1 n is of the random variables X , Y , the sample mean estimator of corrrentropy is
V ^ ( X , Y ) = 1 N i = 1 N g ( x i y i , σ )
where g ( x i y i , σ ) = G σ ( ) = exp ( x i y i 2 2 σ 2 ) is the Gaussian kernel with bandwidth σ > 0 .
As a nonlinear similarity measure, correntropy has been successfully used as an efficient optimization cost in signal processing and machine learning [25], and the corresponding cost function is MCC. In order to further improve the learning performance, Ref. [25] proposed a mixture correntropy that uses the mixture of two Gaussian functions as the kernel function; the mixture correntropy is defined by
M ( X , Y ) = E [ λ G σ 1 ( X Y ) + ( 1 λ ) G σ 2 ( X Y ) ]
where σ 1 and σ 2 are the kernel bandwidths of the Gaussian functions G σ 1 ( ) and G σ 2 ( ) , and 0 λ 1 is the mixture coefficient. Hence, Equation (10) can be written as
M ^ ( X , Y ) = 1 n i 1 n ( λ g ( x i y i , σ 1 ) + ( 1 λ ) g ( x i y i , σ 2 ) )
Especially, when σ 1 = σ 2 , MMCC degenerates into MCC.

3. SVR with MMCC

The central idea of SVR with MMCC (MMCCSVR) is to integrate the MMCC regularization technique and the kernel trick in a unified framework. Therefore, the MMCCSVR seeks the optimal regression function f ( x ) = i = 1 n ω i K ( x , x i ) + b by solving the following optimization problem
max ω , b   J ( ω , b ) = C i = 1 n ( λ g ( ω T φ ( x i ) + b y i , σ 1 ) + ( 1 λ ) g ( ω T φ ( x i ) + b y i , σ 2 ) ) ω 2
Concretely, the first term of (12) is based on the aforementioned MMCC (11) in the φ included kernel space and is used as an empirical loss function to measure the fitting error. The second term of (12) makes the required regression estimation f ( x ) as smooth as possible from the perspective of Tikhonov regularization. It can also be interpreted as a regularization to overcome possible overfitting.
However, there are two difficulties to solve (12) directly. The first one is that the correntropy cost (11) is hard to be optimized. Although Liu et al. proposed a steepest descend-based method in Ref. [27] to tackle this problem, it is not very efficient. The second problem concerning (12) is due to the fact that ω may lie in a high or even infinite dimensional feature space induced by nonlinear map φ ( ) , thus limiting the application of conventional optimization methods. Fortunately, (12) can be converted to a more straightforward form by considering the following proposition to tackle the first problem [28,29,30].
Proposition 1.
There exists a convex function ϕ : R R , such that
g ( x , σ ) = max p < 0 ( p x 2 σ 2 ϕ ( p ) )
and, for a fixed x , the maximum is reached at p = g ( x , σ ) .
The proof of Proposition 1 is based on the convex conjugated function theory and can be found in Ref. [31]. Now, we construct the following optimization problem with augmented objective function in an enlarged parameter space:
max ω , b , p J ¯ ( ω T , b , p ) = C i = 1 n ( λ p i ( ω T φ ( x ) + b y i ) 2 σ 1 2 + ( 1 λ ) p i ( ω T φ ( x ) + b y i ) 2 σ 2 2 ϕ ( p i ) ) ω 2
where p = [ p 1 , p 2 , , p n ] T stores the auxiliary variables introduced in the half-quadratic optimization. Then, according to proposition 1, we notice that when [ ω , b ] is fixed, the following equation holds
J ( ω , b ) = max p J ¯ ( ω , b , p )
Hence, we can further get the following equation
max ω , b J ( ω , b ) = max ω , b , p J ¯ ( ω , b , p )
That is to say, maximizing J ( ω , b ) with respect to [ ω , b ] is equivalent to maximizing the augmented function J ¯ ( ω , b , p ) in the enlarged parameter space [ ω , b , p ] . Now, to overcome the second problem, we introduce a slack vector to convert the unconstrained optimization problem (16) to the following constrained problem ξ = [ ξ 1 , ξ 2 , , ξ n ] T , as in standard SVR
max ω , b , ξ , p C 2 i = 1 n ( λ p i ξ i 2 σ 1 2 + ( 1 λ ) p i ξ i 2 σ 2 2 ϕ ( p i ) ) 1 2 ω 2 s . t . ω T φ ( x i ) + b y i = ξ i , i = 1 , 2 , , n
We can choose to iteratively optimize (17) by alternating the optimization with respect to either [ ω , b , ξ i ] or p, while holding the other term fixed.
First, we hold p fixed and maximize (17) with respect to [ ω , b , ξ i ] . By dropping the other unrelated variables and introducing the variable q i = p i , we obtain the following equivalent problem
min ω , b , ξ C 2 i = 1 n ( λ q i ξ i 2 σ 1 2 + ( 1 λ ) q i ξ i 2 σ 2 2 ) + 1 2 ω 2 s . t . ω T φ ( x i ) + b y i = ξ i , i = 1 , 2 , , n
In order to solve the above problem, we introduce the following Lagrangian function
L = C i = 1 n ( λ q i ξ i 2 2 σ 1 2 + ( 1 λ ) q i ξ i 2 2 σ 2 2 ) + 1 2 ω 2 i = 1 m α i ( ω T φ ( x i ) + b y i ξ i )
where α i is a Lagrangian multiplier. Then, by following the Karush-Kuhn-Tucker (KKT) condition [32], we have
L ω = ω i = 1 n α i φ ( x i ) = 0 ω = i = 1 n α i φ ( x i )
L b = 0 i = 1 n α i = 0
L ξ i = C q i λ ξ i σ 1 2 + C q i ( 1 λ ) ξ i σ 2 2 + α i = 0 ξ i = α i σ 1 2 σ 2 2 λ C q i σ 2 2 + ( 1 λ ) C q i σ 1 2
L α i = 0 ω T φ ( x i ) + b y i = ξ i
Combing Equations (20)–(23) leads to the following linear system of equations
[ K + Q e e T 0 ] [ α b ] = [ Y 0 ]
where K is an n × n kernel matrix with K i j = ϕ ( x i ) T ϕ ( x j ) T = K ( x i , x j ) , Y = [ y 1 , y 2 , , y n ] T , e = [ 1 , 1 , , 1 ] T , and Q is a diagonal matrix whose diagonal element Q i i = ( σ 1 2 σ 2 2 λ C q i σ 2 2 + ( 1 λ ) C q i σ 1 2 ) = ( σ 1 2 σ 2 2 λ C p i σ 2 2 + ( 1 λ ) C p i σ 1 2 ) > 0 , since p i < 0 . Noticing that K + Q is symmetric positive-definite and thus invertible, we can efficiently solve (23) by the following equations [33].
b = e T ( K + Q ) 1 Y e T ( K + Q ) 1 e   and   α = ( K + Q ) 1 ( Y e b )
Second, we hold [ ω , b , ξ ] fixed and optimize (17) with respect to p. Actually, according to Proposition 1, the optimal p is directly given by
p i = g ( ξ i , σ ) = exp ( λ ξ i 2 σ 1 2 + ( 1 λ ) ξ i 2 σ 2 2 )
Up to now, we have solved both sub-problems constituting the original optimization problem (17).
For the sake of clarity, we prescribe the proposed MMCCSVR as follows.
First, set the parameter values p i = q i = 1 for all samples, tolerance ε = 1 e 3 , MMCC parameter σ 1 and σ 2 , kernel parameter γ , regularization parameter C , build kernel matrix K .
Step 1. Calculate the diagonal matrix Q and then solve (25) by Cholesky factorization to obtain [ α , b ] .
Step 2. If [ α , b ] changes less than ε , go to Step 4, otherwise go to Step 3.
Step 3. Calculate the error variable ξ i by (22) and update p i = q i by (26) and then go to Step 1.
Step 4. Determine the final regression estimation by (7).
Referring to the convergence analysis method in Ref. [28,29,30], the convergence of the proposed algorithm is proved as follows.
According to Step 1 and Step 2, we have J ¯ ( ω t , b t , p t ) J ¯ ( ω t + 1 , b t + 1 , p t ) and J ¯ ( ω t + 1 , b t + 1 , p t ) J ¯ ( ω t + 1 , b t + 1 , p t + 1 ) . Hence, we can conclude that the sequence J ¯ ( ω t , b t , p t ) , t = 1 , 2 , is non-decreasing. Based on the property of correntropy, we can verify that the objective function J ¯ ( ω , b , p ) is bounded above, since g ( x , σ ) and ω 2 are both bounded above. Due to the above facts and the well-known monotone convergence theorem, the algorithm is guaranteed to converge to its local optimal solution.

4. Electricity Consumption Forecasting Based on MMCCSVR

4.1. Characteristic Analysis of Electricity Consumption Data

By analyzing the characteristics of users’ EC, power sales company can accurately understand the demand response mechanism and formulate scientific marketing strategies. This is of great significance for peak shaving and valley filling, optimizing the EC curve, and improving power quality.
The monthly EC of a shopping mall in Guangdong Province in 2017 was analyzed as an example. Figure 1 shows the daily EC of the commercial property common area of the mall and its corresponding daily maximum temperature curve for 2017, which are normalized values. As shown in Figure 1, the greatest influence factor of Heating, Ventilation and Air Conditioning (HVAC) EC is the temperature, and it can be seen that the total EC of the property public area of the shopping mall fluctuated, obviously, according to the season and had strong non-linear and fluctuating characteristics. Besides the similar fluctuating trends of HVAC and the total EC of the property public area of the shopping mall, the other sub-items showed a steady trend and did not change with time.
Therefore, considering the influence of HVAC EC, it was possible to forecast the EC of the whole commercial property public area and further analyze the influencing factors of HVAC EC.
Comparing the trends of daily air conditioning power consumption and daily maximum temperature, we obtained Figure 2.
In order to further study their correlation, we calculated Pearson correlation coefficients between historical HVAC daily EC values and their corresponding daily maximum temperature values.
ρ X , Y = cov ( X , Y ) δ X , δ Y = E [ ( X μ X ) ( Y μ Y ) ] δ X , δ Y = 0 . 9214
The results showed that there was a great positive correlation between HVAC EC and air temperature. Therefore, air temperature and historical EC were input in the prediction model in order to improve the prediction accuracy of user EC.

4.2. Data Preprocessing

In the process of forecasting, the difference of magnitude of the data may affect the accuracy of forecasting and the speed of forecasting, so the initial data should be normalized to ensure the accuracy of the forecasting results. In the equation below, y i indicates the normalized data. The formula for data normalization is as follows:
y i = y i y min y max y min
Generally, the initial data are normalized to [0,1] to ensure the speed and accuracy of the forecasting process.
After the prediction results are obtained, the obtained results should be anti-normalized to get the actual value of the predicted electricity. The inverse normalization formula is as follows:
y i = ( y max y min ) × y i + y min

4.3. Parameter Optimization

In order to ensure the accuracy of the results and the stability of the model, it is necessary to optimize the parameters of the ECF model. In this paper, the grid optimization method and K-fold cross-validation were combined to optimize the parameters.
It is convenient and simple to use the grid optimization method to optimize the parameters, because it is faster than other intelligent algorithms and can generally find a global optimal solution. The algorithm represents each set of parameters in the form of a range grid, traverses each set of parameters by using the independence of each grid, and then verifies the rationality of the set of parameters by K-fold cross-validation theory. Based on experience, most of the parameters σ 1 and σ 2 of MMCC were selected in [0, 50]. In the SVR model, both C and the kernel parameter γ have a wide range of values, but a too large range of values will lead to a lower search speed. After many experiments, the probability that γ falls into the [0, 1] interval is higher. Therefore, the grid range of the model parameters were specified in this paper as C = [ 0 , 20 ] , γ = [ 0 , 1 ] , σ 1 = [ 0 , 50 ] , σ 2 = [ 0 , 50 ] .

4.4. Model Implementation

This paper proposes an ECF model based on MMCCSVR considering the users’ characteristics. Based on the robustness of MMCC in non-Gaussian sequence processing, the MMCCSVR ECF model was established to predict future EC. The parameters were optimized by combining grid optimization and K-fold cross-validation. The prediction model is
f ( x ) = i = 1 n ω i K ( x , x i ) + b
where x is input data, including historical EC and corresponding temperature data, f ( x ) is the forecasting results of EC.
The specific implementation process is shown in Figure 3.

4.5. Evaluation Criterion

In order to comprehensively and extensively evaluate the effectiveness of the proposed forecasting model, five widely used performance metrics in statistics, i.e., mean absolute error (MAE), root-mean-square error (RMSE), mean absolute percentage error (MAPE), and coefficient of determination (R2), were considered. The MAE, RMSE, and MAPE were used to measure the deviation between the actual EC and the prediction load. In addition, R2 indicates the extent to which the EC is predictable. These five metrics are defined as follows:
M A E = 1 N i = 1 N | f ( x ) y i | × 100 %
M A P E = 1 N i = 1 N | f ( x ) y i | y i × 100 %
R M S E = 1 N i = 1 N ( f ( x ) y i ) 2 × 100 %
R 2 = ( N i = 1 N y i × f ( x ) i = 1 N y i i = 1 N f ( x ) ) 2 [ N i = 1 N f ( x ) 2 ( i = 1 N f ( x ) ) 2 ] [ N i = 1 N y i 2 ( i = 1 N y i ) 2 ]

5. Results

In order to verify the practicability and forecasting efficiency of the proposed ECF method, we considered the EC data of a shopping mall in Guangdong Province from 1 January 2018, to 3 June 2018, and the corresponding daily maximum temperature as the research object. We selected the daily EC from 1 January to 3 May as the training sample, forecasted the EC of 31 days from 4 May to 3 June 2018, one day ahead, and evaluated comprehensively the ECF model based on MMCCSVR.

5.1. Parameters Selection

In this subsection, the joint efficiency of the free parameters for the prediction accuracy was investigated by using the grid optimization and conducting many experiments. Because the penalty coefficient C showed various empirical values in different applications, and different λ values correspond to a set of different optimal σ 1 and σ 2 , which can make the prediction accuracy meet the requirements, we fixed C = 7 and λ = 0.3 , and then determined the optimal σ 1 and σ 2 . As can be seen in Figure 4, when σ 1 and σ 2 were within the interval [15,25], the prediction accuracy was higher. Taking σ 1 as an example, we further studied the variation of prediction accuracy with kernel parameters.
Figure 5 and Table 1 show the prediction accuracy corresponding to different σ 1 values. The highest prediction accuracy was achieved at σ 1 = 20 ; therefore, σ 1 = 20 was chosen in this paper. Using the same method, we obtained the optimal σ 2 = 15 in this case.

5.2. Comparison of the Forecast Results Obtained with Different Inputs

According to the parameter selection method described in Section 4.3 and Section 5.1, the parameters were set as: C = 7 , λ = 0.3 , σ 1 = 20 , σ 2 = 15 .
In order to further illustrate the necessity of using the temperature as an input variable, the prediction results of two kinds of inputs were compared with those of using only historical electric energy as input. The final prediction results are shown in Figure 6. It can be seen from Figure 6 that when only historical electric energy was used as input, a better prediction value could be obtained at the smooth point of the curve, but the prediction at the inflection point was not as good as that of the MMCCSVR prediction model with temperature as input. By comparing the error analysis results of the proposed method with those obtained by using only historical EC as input, it was found that the average MAPE was about 3.29% lower than that of the proposed method. Therefore, the prediction accuracy can be improved to a certain extent by using two kinds of input: EC and temperature.
The relative errors of daily forecasts are shown in Table 2. The characteristics of the shopping mall EC vary in the week. From the data, it appears that the EC is low on Monday and peaks on Saturday and Sunday. Between 4 May and 3 June 2018, the weekends were 5–6 May, 12–13 May, 19–20 May, 26–27 May, and 2–3 June. As shown in Table 2, the peak and valley values of EC were accurately predicted in the first half month, whereas the predicted values were mostly lower than the true values in the second half month. According to the analysis of historical electricity and temperature data, the temperature in May 2018 was slightly higher than that in May of the previous year, but the thunderstorm weather in the first half of the month may have affected customer flow and activities in the shopping mall. The weather in the second half of the month continued to be cloudy, and the EC increased, since the temperature was slightly higher than in previous years.
Table 2 shows the predicted values using MMCCSVR and the traditional SVR from 4 May to 3 June 2018, and the MAPE between each predicted value and the corresponding actual value. As can be seen from Table 2, compared with the SVR model, the MMCCSVR model allowed to reduce the MAPE of almost every predicted value, especially when approaching the turning point of fluctuation. The results show that using MMCC as the loss function of SVR can significantly improve the prediction accuracy.

5.3. Comparison of Different Forecasting Methods

In order to test whether the proposed model can effectively improve the prediction accuracy, its prediction results were compared with those of other methods. We set the parameters as follows:
SVR: C = 7 , γ = 0.1 ;
Back-propagation algorithm (BP): 10 Hidden layer.
MMCCSVR: C = 7 , λ = 0.3 , σ 1 = 20 , σ 2 = 15 ;
The comparison results are shown in Figure 7. As shown in Figure 7, at the inflection point of the curve, the prediction efficiency of MMCCSVR is obviously better than those of other algorithms, showing the robustness and stability of the proposed method. Moreover, the predicted values changed appropriately during the analysis period, even coinciding with the true values at some individual points.
The percentages of relative error of the different methods are shown in Table 3. Table 3 shows that MMCCSVR has better performance compared with other algorithms, displaying improved prediction accuracy of both global and outlier values.
The relative errors of MMCCSVR for different inputs compared with those of other methods from 4 May to 3 June 2018, are presented in Figure 8. As shown in Figure 8, the relative error of MMCCSVR was evenly distributed on both sides of 0 and had little fluctuation. The relative error of MMCCSVR using historical electricity as input was very large, due to the fact that the temperature changes were not taken into account. BP and traditional SVR algorithms were unable to predict electricity accurately under the influence of random factors.
Table 4 proves the superiority of MMCCSVR over single-input MMCCSVR and the traditional SVR and BP algorithm for four error evaluation indexes; the four error evaluation indexes were obviously reduced, and in particular, the MAPE of this method was about 1.79%, and the R2 was about 0.9781. Therefore, compared with the SVR method and the BP neural network algorithm, the proposed method effectively improved the prediction accuracy, and when temperature was added as input, the prediction efficiency was further improved.
Considering the change of EC for a shopping mall in different months, we implemented the method by using EC data for a whole year, using 65% of the data as the training set and 35% as the testing set. We selected the EC data from 1 January to 25 August 2018 as the training sample and forecasted the EC data from 26 August to 31 December 2018, one-day ahead. The forecasted results are shown in Figure 9. As shown in Figure 9, when more data were used to validate the model proposed in this paper, the prediction curve was still close to the actual curve. Only a few curve inflection points presented inevitable errors due to random factors and the impact of emergencies, resulting in larger errors. In addition, at the inflection point of the curve, the prediction ability of MMCCSVR was obviously better than those of other algorithms, showing its robustness and stability. The MMCCSVR achieved good overall prediction, with the predicted values even coinciding with the true values at individual points.
The relative errors of MMCCSVR for different inputs compared with those of other methods for 26 August to 31 December 2018, are given in Figure 10. As shown in Figure 10, the relative error of MMCCSVR is evenly distributed on both sides of 0 and has little fluctuation. BP and traditional SVR algorithms were unable to predict electricity accurately under the influence of random factors.
Table 5 proves the superiority of MMCCSVR over the traditional SVR and BP algorithm, as shown by four error evaluation indexes; when the experimental data increased, the four error evaluation indexes were obviously reduced. We found that the MAPE of this method was about 3.86% and R2 was about 0.9846. Therefore, compared with the SVR method and BP neural network algorithm, the proposed method has good predictive performance which can effectively improve the prediction accuracy.

6. Conclusions

Around the urgent demand of ECF technology for power sales companies under the wave of power system reform, a novel forecast method for ECF, called MMCCSVR, is proposed, combining the SVR with MMCC. First, the user’s characteristics were analyzed to determine the factors influencing EC, indicating that the historical temperature and historical EC are the main influencing factors. Second, in view of the strong fluctuation of EC for small consumers and the large influence of random factors, the MMCCSVR algorithm with excellent performance under non-Gaussian noise was selected to construct the prediction model, which was applied to a shopping mall in Guangzhou. The following conclusions were reached:
(1)
Compared with the single-input MMCCSVR prediction model, the single-point prediction accuracy was effectively improved, and the average relative error was reduced.
(2)
Compared with the traditional SVR and other algorithms, the prediction errors of peak and valley values of EC were improved effectively.
(3)
The prediction error MAPE of this model was 1.79% and met the assessment criteria of power deviation in the location of the shopping mall and the prediction accuracy requirement of the power sales company.

Author Contributions

Conceptualization, J.D.; Data curation, X.T.; Investigation, L.A.; Methodology, X.T.; Validation, X.Q. and P.W.; Writing—original draft, X.T.; Writing—review & editing, W.M.

Funding

This work was supported by State Key Laboratory of Electrical Insulation and Power Equipment (EIPE18201), the National Natural Science Foundation of China (51877174), Key Project of Natural Science Basic Research Plan in Shaanxi Province of China) (2019JZ-05), National Key R&D Program of China (2016YFB0901900), State Key Laboratory of Advanced Electromagnetic Engineering and Technology (Huazhong University of Science and Technology, AEET2018KF001).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Cao, Y.K.; Shuai, H. Medium and long-term load forecasting based on multi-factors modified PSO-SVM algorithm. Electronics, electrical engineering and information science. In Proceedings of the 2015 International Conference on Electronics, Electrical Engineering and Information Science (EEEIS2015), Guangzhou, China, 7–8 August 2015; pp. 682–693. [Google Scholar]
  2. Niu, D.X.; Ma, T.N.; Liu, B.Y. Power load forecasting by wavelet least squares support vector machine with improved fruit fly optimization algorithm. J. Comb. Optim. 2017, 33, 1122–1143. [Google Scholar]
  3. Sadaei, H.J.; Guimarães, F.G.; da Silva, C.J. Short-term load forecasting method based on fuzzy time series, seasonality and long memory process. Int. J. Approx. Reason. 2017, 83, 196–217. [Google Scholar] [CrossRef]
  4. Zhao, H.; Guo, S. An optimized grey model for annual power load forecasting. Energy 2016, 107, 272–286. [Google Scholar] [CrossRef]
  5. Hu, R.; Wen, S.; Zeng, Z. A short-term power load forecasting model based on the generalized regression neural network with decreasing step fruit fly optimization algorithm. Neurocomputing 2017, 221, 24–31. [Google Scholar] [CrossRef]
  6. Li, Y.; Che, J.; Yang, Y. Subsampled support vector regression ensemble for short term electric load forecasting. Energy 2018, 164, 160–170. [Google Scholar] [CrossRef]
  7. Yang, Y.; Che, J.; Deng, C. Sequential grid approach based support vector regression for short-term electric load forecasting. Appl. Energy 2019, 238, 1010–1021. [Google Scholar] [CrossRef]
  8. Dong, Y.; Zhang, Z.; Hong, W. A hybrid seasonal mechanism with a chaotic cuckoo search algorithm with a support vector regression model for electric load forecasting. Energy 2018, 11, 1009. [Google Scholar] [CrossRef]
  9. Hong, W.; Li, M.; Geng, J. Novel chaotic bat algorithm for forecasting complex motion of floating platforms. Appl. Math. Model. 2019, 72, 425–443. [Google Scholar] [CrossRef]
  10. Zhong, H.; Wang, J.; Jia, H. Vector field-based support vector regression for building energy consumption prediction. Appl. Energy 2019, 242, 403–414. [Google Scholar] [CrossRef]
  11. Liu, W.F.; Pokharel, P.P.; Principe, J.C. Correntropy: Properties and applications in non-Gaussian signal processing. IEEE Trans. Signal Process. 2007, 55, 5286–5298. [Google Scholar] [CrossRef]
  12. Chen, B.D.; Xing, L.; Zhao, H.; Zheng, N.; Principe, J.C. generalized correntropy for robust adaptive filtering. IEEE Trans. Signal Process. 2016, 64, 3376–3387. [Google Scholar] [CrossRef]
  13. Ma, W.; Chen, B.; Duan, J. Diffusion maximum correntropy criterion algorithms for robust distributed estimation. Digit. Signal Process. 2016, 58, 10–19. [Google Scholar] [CrossRef] [Green Version]
  14. Wu, Z.; Shi, J.; Zhang, X. Kernel recursive maximum correntropy. Signal Process. 2015, 117, 11–16. [Google Scholar] [CrossRef]
  15. Ma, W.; Qu, H.; Gui, G. Maximum correntropy criterion based sparse adaptive filtering algorithms for robust channel estimation under non-Gaussian environments. J. Franklin Inst. 2015, 352, 2708–2727. [Google Scholar] [CrossRef] [Green Version]
  16. Qian, G.; Wang, S. Generalized complex correntropy: Application to adaptive filtering of complex data. IEEE Access. 2018, 6, 19113–19120. [Google Scholar] [CrossRef]
  17. Loza, C.A.; Principe, J.C. Generalized correntropy matching pursuit: A novel, robust algorithm for sparse decomposition. In Proceedings of the International Joint Conference on Neural Networks (IJCNN), Vancouver, Canada, 25–29 July 2016; pp. 1723–1727. [Google Scholar]
  18. Luo, X.; Sun, J.; Wang, L. Short-term wind speed forecasting via stacked extreme learning machine with generalized correntropy. IEEE Trans. Ind. Inform. 2018, 14, 4963–4971. [Google Scholar] [CrossRef]
  19. Chen, B.; Ma, R.; Yu, S. Granger causality analysis based on quantized minimum error entropy criterion. IEEE Signal Process. Lett. 2019, 26, 347–351. [Google Scholar] [CrossRef]
  20. Wang, S.; Dang, L.; Chen, B. Random Fourier filters under maximum correntropy criterion. IEEE Trans. Circuits I 2018, 65, 3390–3403. [Google Scholar] [CrossRef]
  21. Hou, B.; He, Z.; Zhou, X. Maximum correntropy criterion kalman filter for α-jerk tracking model with non-gaussian noise. Entropy 2017, 19, 648. [Google Scholar] [CrossRef]
  22. Ma, W.; Zheng, D.; Zhang, Z. Sparse-aware bias-compensated adaptive filtering algorithms using the maximum correntropy criterion for sparse system identification with noisy input. Entropy 2018, 20, 407. [Google Scholar] [CrossRef]
  23. Zhou, N.; Xu, Y.; Cheng, H. Maximum correntropy criterion based sparse subspace learning for unsupervised feature selection. IEEE Trans. Circuits Syst. Video 2017, 99, 404–417. [Google Scholar] [CrossRef]
  24. Wang, Y.; Li, Y.; Albu, F. Group-constrained maximum correntropy criterion algorithms for estimating sparse mix-noised channels. Entropy 2017, 19, 432. [Google Scholar] [CrossRef]
  25. Wu, Z.; Peng, S.; Chen, B. Robust Hammerstein adaptive filtering under maximum correntropy criterion. Entropy Switz. 2015, 17, 7149–7166. [Google Scholar] [CrossRef]
  26. Chen, B.; Wang, X.; Lu, N. Mixture correntropy for robust learning. Pattern Recognit. 2018, 79, 318–327. [Google Scholar] [CrossRef]
  27. Brereton, R.G.; Lloyd, G.R. Support vector machines for classification and regression. Analyst 2010, 135, 230–267. [Google Scholar] [CrossRef]
  28. He, R.; Hu, B.G.; Zhang, W.S. Robust principal component analysis based on maximum correntropy criterion. IEEE Trans. Image Process. 2011, 20, 1485–1494. [Google Scholar]
  29. He, R.; Zheng, W.S.; Tan, T. Half-quadratic-based iterative minimization for robust sparse representation. IEEE Trans. Pattern Anal. 2013, 36, 261–275. [Google Scholar]
  30. Wang, Y.; Tang, Y.Y.; Li, L. Correntropy matching pursuit with application to robust digit and face recognition. IEEE Trans. Cybern. 2016, 47, 1354–1366. [Google Scholar] [CrossRef]
  31. Yan, H.; Yuan, X.; Yan, S. Correntropy based feature selection using binary projection. Pattern Recognit. 2011, 44, 2834–2842. [Google Scholar] [CrossRef]
  32. Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
  33. Ojeda, F.; Suykens, J.; Moor, B.D. Low rank updated LS-SVM classifiers for fast variable selection. Neural Netw. 2008, 21, 437–449. [Google Scholar] [CrossRef]
Figure 1. Monthly electricity consumption (EC) of a shopping mall in 2017.
Figure 1. Monthly electricity consumption (EC) of a shopping mall in 2017.
Entropy 21 00707 g001
Figure 2. Daily EC of the commercial property common area of the mall and its corresponding daily maximum temperature.
Figure 2. Daily EC of the commercial property common area of the mall and its corresponding daily maximum temperature.
Entropy 21 00707 g002
Figure 3. Flow chart of the implementation process.
Figure 3. Flow chart of the implementation process.
Entropy 21 00707 g003
Figure 4. Prediction accuracy varying with the parameters σ 1 and σ 2 .
Figure 4. Prediction accuracy varying with the parameters σ 1 and σ 2 .
Entropy 21 00707 g004
Figure 5. Prediction accuracy varying with the parameter σ 1 .
Figure 5. Prediction accuracy varying with the parameter σ 1 .
Entropy 21 00707 g005
Figure 6. Forecast results of the maximum mixture correntropy criterion support vector regression (MMCCSVR) method for a mall from 4 May to 3 June 2018.
Figure 6. Forecast results of the maximum mixture correntropy criterion support vector regression (MMCCSVR) method for a mall from 4 May to 3 June 2018.
Entropy 21 00707 g006
Figure 7. Forecast results of MMCCSVR compared with those of other methods from 4 May to 3 June 2018.
Figure 7. Forecast results of MMCCSVR compared with those of other methods from 4 May to 3 June 2018.
Entropy 21 00707 g007
Figure 8. Relative error of MMCCSVR with different inputs compared with other methods from 4 May to 3 June 2018.
Figure 8. Relative error of MMCCSVR with different inputs compared with other methods from 4 May to 3 June 2018.
Entropy 21 00707 g008
Figure 9. Forecast results of MMCCSVR for a mall from 26 August to 31 December 2018.
Figure 9. Forecast results of MMCCSVR for a mall from 26 August to 31 December 2018.
Entropy 21 00707 g009
Figure 10. Relative error of MMCCSVR for different inputs compared with those of other methods from 26 August to 31 December 2018.
Figure 10. Relative error of MMCCSVR for different inputs compared with those of other methods from 26 August to 31 December 2018.
Entropy 21 00707 g010
Table 1. Prediction accuracy varying with the parameter σ 1 .
Table 1. Prediction accuracy varying with the parameter σ 1 .
σ 1 1516171819202122232425
Prediction accuracy (%)50.569.283.693.296.098.296.194.891.887.282.4
Table 2. Forecast results of MMCCSVR for a mall from 4 May to 3 June 2018.
Table 2. Forecast results of MMCCSVR for a mall from 4 May to 3 June 2018.
DayActual
(kW·h)
Single Input
(kW·h)
MAPE (%) Two Input
(kW·h)
MAPE (%)
4 May39,44241,4635.12397,190.70
5 May53,82341,4636.4352,141−3.13
6 May40,66657,2838.0138,997−4.10
7 May40,66643,92511.4144,88910.38
8 May48,09945,3053.1648,1000.00
9 May44,92949,6212.3244,735−0.43
10 May42,82045,973−1.8243,1600.79
11 May50,36342,039−2.6050,8200.91
12 May51,38149,056−3.1851,6590.54
13 May55,03959,6638.4054,774−0.48
14 May42,61044,9845.5746,8249.89
15 May42,88644,4083.5542,586−0.70
16 May44,35846,2954.3744,3640.01
17 May42,69944,2243.5742,7770.18
18 May45,32943,647−3.7145,3290.00
19 May56,54952,112−7.8554,990−2.76
20 May55,03953,006−3.6954,984−0.10
21 May42,61041,170−3.3842,090−1.22
22 May53,26849,759−6.5952,574−1.30
23 May52,70750,851−3.5252,8740.32
24 May52,91350,834−3.9352,546−0.69
25 May52,54750,085−4.6952,329−0.41
26 May60,09254,925−8.6060,4080.53
27 May57,18853,659−6.1753,875−5.79
28 May50,61449,047−3.1050,6140.00
29 May51,34148,776−5.0049,713−3.17
30 May53,85751,274−4.8053,825−0.06
31 May53,62050,864−5.1453,8490.43
1 June62,69158,118−7.2966,1435.51
2 June59,69956,645−5.1260,0450.58
3 June56,61953,493−5.5256,348−0.48
MAPE5.08%1.79%
Table 3. Percentage of relative error of different methods. BP: Back-propagation.
Table 3. Percentage of relative error of different methods. BP: Back-propagation.
DayMMCCSVRSVRBP
4 May0.709.1726.81
5 May−3.13−3.88−25.81
6 May−4.1014.5927.14
7 May10.3811.6013.49
8 May0.00−0.120.06
9 May−0.431.542.64
10 May0.794.2117.74
11 May0.91−4.80−42.53
12 May0.54−1.731.49
13 May−0.48−3.81−11.05
14 May9.8911.7813.64
15 May−0.706.124.89
16 May0.015.42−8.01
17 May0.187.595.22
18 May0.004.581.04
19 May−2.76−10.57−16.71
20 May−0.10−7.06−7.08
21 May−1.224.1121.46
22 May−1.30−9.77−15.16
23 May0.32−5.46−10.99
24 May−0.69−6.04−6.78
25 May−0.41−3.941.95
26 May0.53−13.44−9.73
27 May−5.79−10.54−5.53
28 May0.00−5.426.09
29 May−3.17−4.341.30
30 May−0.06−7.40−5.79
31 May0.43−6.56−6.13
1 June5.51−12.45−11.81
2 June0.58−7.79−1.61
3 June−0.48−6.561.96
MAPE1.79%6.84%10.70%
Table 4. Comparison of electricity consumption forecasting (ECF) accuracy. MAPE: mean absolute percentage error, MAE: mean absolute error, RMSE: root-mean-square error, R2: coefficient of determination.
Table 4. Comparison of electricity consumption forecasting (ECF) accuracy. MAPE: mean absolute percentage error, MAE: mean absolute error, RMSE: root-mean-square error, R2: coefficient of determination.
MethodMAPEMAERMSER2
MMCCSVR1.79%875.83871515.2280.9781
MMCCSVR
(Single Input)
5.08%2582.83872836.03480.9150
SVR6.84%3460.87103951.01360.9304
BP10.70%5220.80656957.56020.3541
Table 5. Forecast results of MMCCSVR for a mall from 26 August to 31 December 2018.
Table 5. Forecast results of MMCCSVR for a mall from 26 August to 31 December 2018.
MethodMAPEMAERMSER2
MMCCSVR3.86%1528.22289.70.9846
SVR13.78%3966.43756180.05210.9173
BP10.43%3123.57483978.95820.9481

Share and Cite

MDPI and ACS Style

Duan, J.; Tian, X.; Ma, W.; Qiu, X.; Wang, P.; An, L. Electricity Consumption Forecasting using Support Vector Regression with the Mixture Maximum Correntropy Criterion. Entropy 2019, 21, 707. https://doi.org/10.3390/e21070707

AMA Style

Duan J, Tian X, Ma W, Qiu X, Wang P, An L. Electricity Consumption Forecasting using Support Vector Regression with the Mixture Maximum Correntropy Criterion. Entropy. 2019; 21(7):707. https://doi.org/10.3390/e21070707

Chicago/Turabian Style

Duan, Jiandong, Xuan Tian, Wentao Ma, Xinyu Qiu, Peng Wang, and Lin An. 2019. "Electricity Consumption Forecasting using Support Vector Regression with the Mixture Maximum Correntropy Criterion" Entropy 21, no. 7: 707. https://doi.org/10.3390/e21070707

APA Style

Duan, J., Tian, X., Ma, W., Qiu, X., Wang, P., & An, L. (2019). Electricity Consumption Forecasting using Support Vector Regression with the Mixture Maximum Correntropy Criterion. Entropy, 21(7), 707. https://doi.org/10.3390/e21070707

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop