Kernel Ridge Regression Model Based on Beta-Noise and Its Application in Short-Term Wind Speed Forecasting

Zhang, Shiguang; Zhou, Ting; Sun, Lin; Liu, Chao

doi:10.3390/sym11020282

Open AccessArticle

Kernel Ridge Regression Model Based on Beta-Noise and Its Application in Short-Term Wind Speed Forecasting

by

Shiguang Zhang

¹,

Ting Zhou

²,

Lin Sun

^1,*

and

Chao Liu

¹

College of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China

²

The State-Owned Assets Management Office, Henan Normal University, Xinxiang 453007, China

^*

Author to whom correspondence should be addressed.

Symmetry 2019, 11(2), 282; https://doi.org/10.3390/sym11020282

Submission received: 5 January 2019 / Revised: 14 February 2019 / Accepted: 19 February 2019 / Published: 22 February 2019

(This article belongs to the Special Issue Information Technology and Its Applications 2021)

Download

Browse Figures

Versions Notes

Abstract

:

The Kernel ridge regression (

K R R

) model aims to find the hidden nonlinear structure in raw data. It makes an assumption that the noise in data satisfies the Gaussian model. However, it was pointed out that the noise in wind speed/power forecasting obeys the Beta distribution. The classic regression techniques are not applicable to this case. Hence, we derive the empirical risk loss about the Beta distribution and propose a technique of the kernel ridge regression model based on the Beta-noise (

B N

-

K R R

). The numerical experiments are carried out on real-world data. The results indicate that the proposed technique obtains good performance on short-term wind speed forecasting.

Keywords:

ridge regression model; kernel function; Beta-noise empirical risk loss; wind speed forecasting

1. Introduction

Linear regression (

L R

) is an approach to using the least squares method to model the relationship between a scalar dependent variable and one or more explanatory variables. It also refers to the plane points that are fitted with a straight line or the points in a high dimension space that are fitted with a hyperplane. This method is very sensitive to predictors being in a configuration of near-collinearity. Ridge regression (

R R

) is a variant of linear regression whose goal is to circumvent the problem of predictors collinearity. The ridge regression model is a powerful technique of machine learning which was introduced by Hoerl [1] and Hastie et al. [2], and it is a method from classical statistics that implements a regularized form of least squares regression [3]. Ridge regression is an alternative method for learning function based on a regularized extension of least squares techniques [4].

Given the data-set

D_{N} = (x_{1}, y_{1}), (x_{2}, y_{2}), \dots, (x_{N}, y_{N})

(1)

where

x_{i} \in X = R^{n}

,

y_{i} \in R

,

i = 1, \dots, N

is the data-set. A multiple

L R

is

f (x) = ϖ^{T} \cdot x + b

. R represents real number set,

R^{n}

is n dimensional Euclidean space, N is the number of sample points, superscript T denotes the matrix transpose.

L R

and

R R

determine the parameter vector

ϖ \in R^{n}

by minimizing the objective functions, respectively:

g_{L R} = \sum_{i = 1}^{N} {(y_{i} - ϖ^{T} \cdot x - b)}^{2},

(2)

g_{R R} = \frac{1}{2} \cdot ϖ^{T} \cdot ϖ + C \cdot \sum_{i = 1}^{N} {(y_{i} - ϖ^{T} \cdot x - b)}^{2} .

(3)

The objective function used in ridge regression implements a form of Tikhonov [5] regularization of a sum-of-squares error metric, which is a regularization parameter controlling the bias-variance trade-off [6]. This corresponds to penalized maximum likelihood estimation of

ϖ

, assuming the targets have been corrupted by independent identical probability distribution (i.i.d.) samples from a Gaussian noise process with zeros mean and variance

σ^{2}

, i.e.,

y_{i} = f (x_{i}) + ξ_{i}, ξ \in N (0, σ^{2}), i = 1, \dots, N

.

The

K R R

model based on Gaussian-noise characteristic is derived by Saunders et al [7].

R R

[1,3,5] aims to find the hidden nonlinear structure in the raw data, while nonlinear mapping is approximated by means of

K R R

based on kerneltechniques [7,8,9,10,11]. Therefore a linear

R R

model is constructed in a feature-space H (

Φ : R^{n} ⟶ H

), induced by a nonlinear kernel function defining the inner product

K (x_{i}, x_{j}) = (Φ (x_{i}) \cdot Φ (x_{j}))

(

i, j = 1, \dots, N

). The kernel function

Φ : R^{n} \to H

may be any positive definite Mercer kernel. Therefore, the objective function of

K R R

based on Gaussian-noise (

G N

-

K R R

) minimization can be written as

g_{G N - K R R} = \frac{1}{2} \cdot ϖ^{T} \cdot ϖ + C \cdot \sum_{i = 1}^{N} {(y_{i} - ϖ^{T} \cdot Φ (x) - b)}^{2} .

(4)

Suppose the noise is Gaussian, the

G N - K R R

model may meet the requirements. However, the noise in wind speed and wind power forecast does not obey the Gaussian distribution, but the Beta distribution. The classic regression techniques are not applicable to above case. The uncertainty of wind power predictions was investigated in [12]. The statistics of the wind power forecasting error were not Gaussian. The work in [13] also found that the output of wind turbine systems is limited between zero and the maximum power and the error statistics do not follow a normal distribution. It also proved that using the Beta-function is justifiable for wind power prediction about chi-squared tests. In [14], the standard deviation of the data set was a function of the normalized predicted power

p = p_{p r e d} / p_{i n s t}

, where

p_{p r e d}

is the predicted power and

p_{i n s t}

is the wind power installed capacity. Fabbri [14] pointed out standardized production power p be within the interval

[0, 1]

and Beta-function are more suitable than standard normal distribution. Literature [15] exhibited the advantages of using Beta-probability distribution function (pdf) instead of Gaussian pdf for approximating the forecasting error. Based on the above literature [12,13,14,15,16], this work plans to study the error of Beta-distribution between the predicted values

x_{p}

and the measured values

x_{m}

in the wind speed forecasting, and pdf of

ε_{i}

is

f (ε_{i}) = {ε_{i}}^{u - 1} \cdot {(1 - ε_{i})}^{v - 1} \cdot h, ε_{i} \in (0, 1), i = 1, 2, \dots, N

, plotted below in Figure 1. Where

u, v

are parameters, h is normalization-factor, and the parameters

u, v

may be determined by the given values of mean and standard deviation [17].

It is not suitable to apply the

K R R

techniques based on Gaussian-noise model (

G N

-

K R R

) to fit functions from data-set with Beta-noise. In order to solve the above problem, this work focuses on the utilization of optimization theory and Beta-noise loss function and derives a method of

K R R

based on Beta-noise characteristic (

B N

-

K R R

). It also introduces a forecasting technique that can deal with high-dimensionality and nonlinearity simultaneously.

This paper is organized as follows. In Section 2, we will derive the Beta-noise empirical risk loss by the Bayesian principle. Section 3 describes the proposed

K R R

model based on the Beta-noise. Section 4 gives the solution and algorithm design of

K R R

of the Beta-noise characteristic based on Genetic Algorithm. The numerical experiments are carried out on

B N

-

K R R

to short-term wind speed and wind power prediction in Section 5, respectively. Finally, the conclusions and future work are given in Section 6.

2. Bayesian Principle to Beta-Noise Empirical Risk Loss

Learning to fit data with noise is an important problem in many real-world data mining applications. Given a training set

D_{N}

of (1) with noise is additive

y_{i} = f (x_{i}) + ξ_{i}, i = 1, \dots, N

(5)

where

ξ_{i}

is random i.i.d.

P (ξ_{i})

with standard deviation

σ

and mean

μ

.

The objective is to find regressor f minimizing the expected risk [18,19]

R [f] = \int l (x, y, f (x)) d P (x, y)

based on the empirical data

D_{l}

, where

l (x, y)

is a empirical risk loss (determining how we will penalize estimation errors). Since we do not know the distribution

P (x, y)

, it can only use data-set

D_{N}

to estimate a regressor f and minimize

R [f]

. A possible approximation consists of replacing the integration by the empirical estimate to get the empirical risk

R_{e m p} [f] = \frac{1}{N} \cdot \sum_{i = 1}^{N} l (x, y_{i}, f (x_{i}))

. In general, we should add a capacity control term in

R R

and

K R R

, which leads to the regularized risk functional [18,20]

R_{r e g} [f] = R_{e m p} [f] + \frac{λ}{2} \cdot {∥ ϖ ∥}^{2},

(6)

where

λ > 0

is a regularization constant,

R_{e m p} [f]

is the empirical risk. It is well known that

l (ξ_{i}) = \frac{1}{2} \cdot ξ_{i}^{2}

is the empirical risk loss of Gaussian-noise characteristics for

L R

(2),

R R

(3), and

K R R

(4). However, what is the empirical risk loss about Beta-noise of

K R R

model? The Beta-noise empirical risk loss by the use of Bayesian principle is given as follows.

The regressor

f (x)

is unknown, the objection is to estimate the regressor

f (x)

from

g \in D_{N}

. According to the literature [20,21,22], the optimal empirical risk loss from maximum likelihood be

l (x, y, f (x)) = - l o g p (y - f (x)) .

(7)

The maximum likelihood estimation be

X_{f} = {(x_{1}, f (x_{1})), (x_{2}, f (x_{2})), \dots, (x_{N}, f (x_{N}))},

(8)

p (X_{f} | X) = \prod_{i = 1}^{N} p (f (x_{i}) | (x_{i}, y_{i})) = \prod_{i = 1}^{N} p (y_{i} - f (x_{i})) .

(9)

Maximizing

p (X_{f} | X)

is equivalent to minimizing

- l o g (p (X_{f} | X))

. Using Equation (7), we have

l (x, y, f (x)) = - l o g p (X_{f} | X) .

(10)

Suppose noise in Equation (5) adheres to Beta distribution with mean

μ \in (0, 1)

and variance

σ^{2}

, thus we can get

u = (1 - μ) \cdot μ^{2} / σ^{2} - μ

,

v = (1 - μ) / μ \cdot u

[13,14], where

h = Γ (u + v) / Γ (u) \cdot Γ (v)

is the normalization-factor. By Equation (10), the Beta-noise empirical risk loss is

l (ξ) = l (y - f (x)) = (1 - u) l o g (ξ) + (1 - v) l o g (1 - ξ) .

(11)

Empirical risk loss of Gauss-noise and Beta-noise with different parameters is shown in Figure 2.

3. $KRR$ Model Based on Beta-Noise

It is not appropriate to apply the

K R R

model based on Gaussian-noise characteristic (

G N

-

K R R

) to deal with tasks with the Beta-noise distribution. Consequently, we use Beta-noise loss function and maximum likelihood method to estimate the optimal loss function. Now, we derive the optimal empirical risk loss about Beta-noise distribution, and propose a new technique of the

K R R

model based on Beta-noise characteristic (

B N

-

K R R

).

First, considering constructing

L R

regressor

f (x) = ϖ^{T} \cdot x_{i} + b

, where

x_{i} = {(1, x_{i 1}, \dots, x_{i n})}^{T}

(

i = 1, 2, \dots, N

),

ϖ = {(ϖ_{0}, ϖ_{1}, \dots, ϖ_{n})}^{T}

. We use kernel techniques and construct the kernel function

K (•, •)

where

K (x_{i}, x_{j}) = (Φ (x_{i}) \cdot Φ (x_{j}))

,

Φ : R^{n} \to H

, H is Hilbert space, and

(x_{i} \cdot x_{j})

(

i, j = 1, 2, \dots, N

) is inner product of H. Then we extend kernel techniques to the ridge regression model based on the Beta-noise characteristic.

Let the set of inputs be

{(x_{i}, y_{i}), i = 1, \dots, N}

, where i represents the indicator for the i-th sample in

D_{l}

. For the general Beta-noise characteristic, it is Formula (11) that the Beta-noise loss function

c (ξ_{i})

in the sample point

{(x_{i}, y_{i})}

of

D_{N}

. Owing to the fact that ridge regression and

K R R

techniques with Gaussian-noise characteristic (

G N - K R R

) are not suitable to Beta-noise distribution in time series problems, the Formula (11) is selected as Beta empirical risk loss to overcome the shortage of

G N

-

K R R

. The primal problem of

K R R

model with the Beta-noise (Denoted by

B N

-

K R R

) can be described as follows (

C > 0

)

\begin{matrix} m i n {g_{P_{B N - K R R}} = \frac{1}{2} ϖ^{T} \cdot ϖ + C \cdot \sum_{i = 1}^{N} c (ξ_{i})} \\ P_{B N - K R R} : s . t . y_{i} - ϖ^{T} \cdot Φ (x_{i}) - b = ξ_{i} . \end{matrix}

(12)

where

c (ξ_{i}) = (1 - u) l o g (ξ_{i}) + (1 - v) l o g (1 - ξ_{i})

,

ξ_{i} = y_{i} - f (x_{i}) i = 1, \dots, N

.

Theorem 1.

Model

B N

-

K R R

’s Solution to original Problem (12) about ϖ exists and is unique.

Proof.

The existence of solutions is trivial. The uniqueness of solutions is shown below. If we have solutions

\bar{ϖ}

,

\tilde{ϖ}

, then Problem (12) exist solutions

(\bar{ϖ}, \bar{b}, \bar{ξ})

and

(\tilde{ϖ}, \tilde{b}, \tilde{ξ})

. Define

(ϖ, b, ξ)

as follows

ϖ = \frac{1}{2} (\bar{ϖ} + \tilde{ϖ}), b = \frac{1}{2} (\bar{b} + \tilde{b}), ξ = \frac{1}{2} (\bar{ξ} + \tilde{ξ}) .

(13)

We have

y_{i} - ϖ^{T} \cdot Φ (x_{i}) - ξ_{i} = y_{i} - \frac{1}{2} {(\bar{ϖ} + \tilde{ϖ})}^{T} \cdot Φ (x_{i}) - \frac{1}{2} (\bar{b} + \tilde{b}) - \frac{1}{2} (\bar{ξ} + \tilde{ξ})

= \frac{1}{2} (y_{i} - {\bar{ϖ}}^{T} \cdot Φ (x_{i}) - \bar{b} - {\bar{ξ}}_{i}) + \frac{1}{2} (y_{i} - {\tilde{ϖ}}^{T} \cdot Φ (x_{i}) - \tilde{b} - {\tilde{ξ}}_{i}) = 0,

where

\frac{1}{2} (\bar{ξ} + \tilde{ξ}) \geq 0 (i = 1, \dots, l)

,

(ϖ, b, ξ)

is a feasible solution of Problem (12). Further,

\frac{1}{2} {∥ ϖ ∥}^{2} + C \sum_{i = 1}^{N} c (ξ_{i}) \geq \frac{1}{2} {∥ \bar{ϖ} ∥}^{2} + C \sum c ({\tilde{ξ}}_{i}),

(14)

\frac{1}{2} {∥ ϖ ∥}^{2} + C \sum_{i = 1}^{N} c (ξ_{i}) \geq \frac{1}{2} {∥ \tilde{ϖ} ∥}^{2} + C \sum c ({\tilde{ξ}}_{i}) .

(15)

By Inequalities (14) and (15), we get

{2 ∥ ϖ ∥}^{2} \geq ∥ \bar{ϖ} ∥^{2} + {∥ \tilde{ϖ} ∥}^{2}

.

ϖ = \bar{ϖ} + \tilde{ϖ}

is substituted into the above inequality

∥ \bar{ϖ} + \tilde{ϖ} ∥^{2} \geq 2 (∥ \bar{ϖ} ∥^{2} + ∥ \tilde{ϖ} ∥^{2}),

(16)

as

∥ \bar{ϖ} + \tilde{ϖ} ∥ \leq ∥ \bar{ϖ} ∥ + ∥ \tilde{ϖ} ∥

, then

(∥ \bar{ϖ} ∥ + ∥ \tilde{ϖ} {∥)}^{2} \geq 2 (∥ \bar{ϖ} ∥^{2} + ∥ \tilde{ϖ} ∥^{2}) .

(17)

In addition,

2 ∥ \bar{ϖ} ∥ \cdot ∥ \tilde{ϖ} ∥ \leq ∥ \bar{ϖ} ∥^{2} + {∥ \tilde{ϖ} ∥}^{2}

, by (17), get

2 ∥ \bar{ϖ} ∥ \cdot ∥ \tilde{ϖ} ∥ = ∥ \bar{ϖ} ∥^{2} + ∥ \tilde{ϖ} ∥^{2}, ∥ \bar{ϖ} ∥ = ∥ \tilde{ϖ} ∥, ∥ \bar{ϖ} + \tilde{ϖ} ∥ = ∥ \bar{ϖ} ∥ + ∥ \tilde{ϖ} ∥ .

For

\tilde{ϖ} = m \cdot \bar{ϖ}

, thus

m = 1

or

m = - 1

. Since

m = - 1

, then

\bar{ϖ} + \tilde{ϖ} = 0

. By

∥ \bar{ϖ} + \tilde{ϖ} ∥ = ∥ \bar{ϖ} ∥ + ∥ \tilde{ϖ} ∥

, get

∥ \bar{ϖ} ∥ = ∥ \tilde{ϖ} ∥ = 0

. Namely,

\bar{ϖ} = \tilde{ϖ} = 0

. For

m = 1

, thus

\bar{ϖ} = \tilde{ϖ}

.

In conclusion, Solution to Problem (12) exists and is unique. □

Theorem 2.

Model

B N

-

K R R

’s dual Problem of primal Problem (12) is

\begin{matrix} M a x {g_{D_{B N - K R R}} = - \frac{1}{2} \sum_{i = 1}^{N} \sum_{j = 1}^{N} (α_{i} \cdot α_{j} \cdot K (x_{i}, x_{j})) + \sum_{i = 1}^{N} (α_{i} \cdot y_{i}) \\ + C \sum_{i = 1}^{N} ((1 - u) log (ξ_{i} (α_{i})) + (1 - v) log (1 - ξ_{i} (α_{i})))} \\ D_{B N - K R R} : s . t . \sum_{i = 1}^{l} (α_{i}) = 0 \end{matrix}

(18)

where

ξ_{i} (α_{i}) = \frac{2 + α_{i} / C - u - v - Δ^{\frac{1}{2}}}{2 \cdot α_{i} / C}

,

Δ = {(α_{i} / C + u - v)}^{2} + 4 \cdot (1 + u \cdot v - u - v)

(

i = 1, \dots, N

) and

C > 0

is constant.

Proof.

The introduction of Lagrange functional

L (ϖ, b, α, ξ)

is

L (ϖ, b, α, ξ) = \frac{1}{2} ϖ^{T} \cdot ϖ + C \cdot \sum_{i = 1}^{N} (c (ξ_{i})) + \sum_{i = 1}^{N} α_{i} (y_{i} - ϖ^{T} \cdot Φ (x_{i}) - b - ξ_{i}) .

For the sake of the minimum

L (ϖ, b, α, ξ)

, seek partial derivative to

ϖ, b, α, ξ

respectively. From Karush-Kuhn-Tucker (KKT) conditions, obtain

\nabla_{ϖ} (L) = 0, \nabla_{b} (L) = 0, \nabla_{ξ} (L) = 0 .

So

ϖ = \sum_{i = 1}^{N} (α_{i} \cdot Φ (x_{i})), \sum_{i = 1}^{N} (α_{i}) = 0, C \cdot \frac{\partial (c (ξ_{i}))}{\partial ξ_{i}} - α_{i} = 0 .

Substituting the extreme conditions into

L (ϖ, b, α, ξ)

and finding the Maximum of α, thus derive the dual Problem (18) of Problem (12). □

On account of

\frac{\partial (c (ξ_{i}))}{\partial (ξ_{i})} = \frac{1 - α}{ξ_{i}} - \frac{1 - β}{1 - ξ_{i}}

, by

C \cdot \frac{\partial (c (ξ_{i}))}{\partial (ξ_{i})} - α_{i} = 0

, we have

α_{i} / C \cdot ξ_{i}^{2} - (2 + α_{i} / C - α - β) \cdot ξ_{i} + 1 - α = 0 .

Now get

ξ_{i 1} (α_{i}) = \frac{2 + α_{i} / C - α - β + Δ^{\frac{1}{2}}}{2 \cdot α_{i} / C}, ξ_{i 2} (α_{i}) = \frac{2 + α_{i} / C - α - β - Δ^{\frac{1}{2}}}{2 \cdot α_{i} / C} .

where

Δ = {(α_{i} / C + α - β)}^{2} + 4 \cdot (1 + α \cdot β - α - β)

. Because of

0 < ξ_{i} (α_{i}) < 1

, we reject

ξ_{i 1} (α_{i})

and let

ξ_{i} (α_{i}) = ξ_{i 2} (α_{i})

.

We have

ϖ = \sum_{i = 1}^{N} (α_{i} \cdot Φ (x_{i})), b = y_{i} - \sum_{i = 1}^{N} (α_{j} \cdot K (x_{j}, x_{i})) - ξ_{i} (α_{i})

, gain the decision-making function of

B N

-

K R R

is

f (x) = ϖ^{T} \cdot Φ (x) + b = \sum_{i = 1}^{N} (α_{i} \cdot K (x_{i}, x)) + b .

Note: The

K R R

of the Gaussian-noise characteristic (

G N - K R R

) was discussed in [9,10,11]. The Gaussian empirical risk loss in the sample point

(x_{i}, y_{i}) \in D_{N}

is

c (ξ_{i}) = \frac{1}{2} ξ_{i}^{2}

, thus the dual Problem of model

K R R

based on Gaussian-noise characteristic (

G N - K R R

) is

\begin{matrix} m a x {g_{D_{G N - K R R}} = - \frac{1}{2} \sum_{i = 1}^{N} \sum_{j = 1}^{N} (α_{i} \cdot α_{j} \cdot K (x_{i}, x_{j})) + \sum_{i = 1}^{N} (α_{i} \cdot y_{i}) - \frac{1}{2 C} \cdot \sum_{i = 1}^{N} (α_{i}^{2})} \\ D_{G N - K R R} : s . t . \sum_{i = 1}^{N} α_{i} = 0 . \end{matrix}

(19)

The dual Problem of model

R R

based on the Gaussian-noise characteristic (

G N

-

R R

) is

\begin{matrix} m a x {g_{D_{G N - R R}} = - \frac{1}{2} \sum_{i = 1}^{N} \sum_{j = 1}^{N} (α_{i} \cdot α_{j} \cdot (x_{i}, x_{j})) + \sum_{i = 1}^{N} (α_{i} \cdot y_{i}) - \frac{1}{2 C} \cdot \sum_{i = 1}^{N} (α_{i}^{2})} \\ D_{G N - R R} : s . t . \sum_{i = 1}^{N} α_{i} = 0 . \end{matrix}

(20)

4. Solution Based on Genetic Algorithm

We get the Solution and algorithm design of model

K R R

based on Beta-noise characteristic (

B N

-

K R R

) as follows.

(1): Let training samples $D_{N} = {(x_{1}, y_{1}), (x_{2}, y_{2}), \dots, (x_{N}, y_{N})}$ , where $x_{i} \in X = R^{n}$ , $y_{i} \in R$ ( $i = 1, \dots, N$ ).
(2): Select the appropriate positive $C, u, v$ and the suitable kernel $K (•, •)$ .
(3): Solve optimization Problem (18), gain optimal Solution $α = (α_{1}, \dots, α_{N})$ .
(4): Construct the decision-making function

$f (x) = ϖ^{T} \cdot Φ (x) + b = \sum_{i = 1}^{N} (α_{i} \cdot K (x_{i}, x)) + b,$

and $b = y_{i} - \sum_{i = 1}^{N} (α_{j} \cdot K (x_{j}, x_{i})) - ξ_{i} (α_{i})$ .

The confirmation of unknown parameters of model

B N - K R R

is a complicated process and the appropriate parameter combination of the models can enhance the regression accuracy of the kernel ridge regression based on Beta-noise. Genetic Algorithm (GA) [23,24,25] is a search heuristic that mimics the process of natural evolution, this heuristic is routinely used to generate useful solutions to optimize and search problems. In GA, the evolution usually starts from a population of randomly generated individuals and happens in generations. In each generation, the fitness of every individual in the population is evaluated, multiple individuals are stochastically selected from the current population and modified to form a new population. The new population is then used in the next iteration of the algorithm. Commonly, the algorithm terminates when either a maximum number of generations has been produced, or a satisfactory fitness level has been reached for the population. If the algorithm has terminated due to a maximum number of generations, a satisfactory solution may or may not have been reached.

GA is considered as one of the modern optimization algorithms to solve the combinatorial optimization problem and is used to determine the parameters of model

B N

-

K R R

. Based on the survival and reproduction of the fitness, GA is continually applied to get new and better solutions without any pre-assumptions, such as continuity and unimodality [26,27,28]. The proposed model

B N

-

K R R

has been implemented in Matlab 7.8 programming language. The experiments are made on the 8.0 GHz Core (TM) i7-4790 CPU personal computer with 3.60 GB memory under Microsoft Windows XP Professional. The initial parameters of GA are

M a x - c g e n = 100

,

C \in [1, 201]

,

u, v \in (0, \propto)

. Many practical applications display that polynomial and Gaussian kernels perform well under general smooth assumptions [29]. This work, polynomial, and Gaussian kernels can be used as the kernel for models

ν

-

S V R

,

G N

-

K R R

, and

B N

-

K R R

:

K (x_{i}, x_{j}) = {((x_{i} \cdot x_{j}) + 1)}^{d}, K (x_{i}, x_{j}) = exp (\frac{- ∥ x_{i} - x_{j} ∥^{2}}{σ^{2}}),

where d is positive integer, and let

d = 1, 2

, or 3.

σ

is positive, and take

σ = 0.2

.

As we all know, no prediction model forecasts perfectly. There are also certain criteria, such as mean absolute error (MAE), the root mean square error (RMSE), mean absolute percentage error (MAPE), and standard error of prediction (SEP) are used to evaluate the predictive performance of models

ν

-

S V R

,

G N

-

K R R

, and

B N

-

K R R

. The four criteria are defined as follows:

MAE = \frac{1}{N} \cdot \sum_{i = 1}^{N} | x_{p, i} - x_{m, i} |,

(21)

MAPE = \frac{1}{N} \cdot \sum_{i = 1}^{N} \frac{| x_{p, i} - x_{m, i} |}{x_{m, i}},

(22)

RMSE = \frac{1}{N} \cdot \sqrt{\sum_{i = 1}^{N} {(x_{p, i} - x_{m, i})}^{2}},

(23)

SEP = \frac{\sqrt{\sum_{i = 1}^{N} {(x_{p, i} - x_{m, i})}^{2}}}{\sum_{i = 1}^{N} x_{m, i}},

(24)

where l is the size of the selected samples,

m, i

is the measured result of data-point

x_{i}

, and

p, i

is the predictive result of data-point

x_{i}

(

i = 1, 2, \dots, N

) [14,15,16].

5. Short-Term Wind Speed and Wind Power Forecasting with Real Data-Set

The model

B N

-

K R R

is applied to the multi-factors actual data-set for wind speed sequence prediction from Jilin Province. The wind speed data contain more than a year of samples which are collected in intervals of ten minutes, and the number of wind speed data is 62,466. Each column attribute is mean, variance, minimum, and maximum, respectively. The short-term wind speed forecast is studied as follows.

Suppose the training sample number is 2160 (from 1 to 2160 for 15 days), and the number of test samples is 720 (from 2161 to 2880 for 5 days). The input vector is

\vec{x_{i}} = (x_{i}, x_{i + 1}, x_{i + 2}, \dots,

x_{i + 11})

, the output value is

x_{i + 11 + s t e p}

, and

s t e p = 1, 3

. Namely, the pattern above is used to forecast the wind speed each interval of 10 and 30 min at each Point

x_{i + 11}

, respectively [30,31].

1. Forecast wind speed at point

x_{i + 11}

each interval of 10 min

The short-term wind speed sequence forecast results at point

x_{i + 11}

each interval of 10 min given by

G N

-

K R R

[7,8,32],

ν

-

S V R

[33,34], and

B N

-

K R R

are illuminated with Figure 3. In

G N

-

K R R

, parameter

C = 151

. In

ν

-

S V R

, parameter

C = 151, ν = 0.54

. In

B N

-

K R R

, parameters

C = 181, u = 3.6084,

v = 3.0889

.

MAE, MAPE, RMSE, and SEP indicators are used to evaluate the prediction results of the three models at point

x_{i + 11}

each interval of 10 min shown in Table 1.

2. Forecast wind speed at point

x_{i + 11}

each interval of 30 min

The short-term wind power sequence forecast results at point

x_{i + 11}

each interval of 30 min given by

G N

-

K R R

,

ν

-

S V R

, and

B N

-

K R R

are illuminated with Figure 4. In

G N - K R R

, parameter

C = 151

. In

ν

-

S V R

, parameter

C = 151, ν = 0.54

. In

B N

-

K R R

, parameters

C = 181, u = 3.6084,

v = 3.0889

.

MAE, MAPE, RMSE, and SEP indicators are used to evaluate the prediction results of the three models at point

x_{i + 11}

each interval of 30 min shown in Table 2.

The results of wind speed forecasting experiments indicate that

B N

-

K R R

has better performance than

G N

-

K R R

and

ν

-

S V R

in 10-min and 30-min short-term wind speed forecasting.

We have predicted the short-term wind speed from the Jilin Province wind farm, so we can calculate the wind power according to the Formula (25):

P_{M} = \{\begin{matrix} 0, & v < v_{c u t - i n}, or v > v_{c u t - o u t} \\ \frac{v - v_{c u t - i n}}{v_{r} - v_{c u t - i n}}, & v_{c u t - i n} \leq v < v_{r} \\ P_{r}, & v_{r} \leq v < v_{c u t - i n} . \end{matrix}

(25)

where

v_{c u t - i n}

and

v_{c u t - o u t}

represent cut-in wind speed and cut-out wind speed of wind turbine, respectively.

v_{r}

and

P_{r}

represent rated wind speed and rated power of wind turbine, respectively. The predictive wind speed is substituted into the Formula (25), we can obtain the predicted wind power.

6. Conclusions and Future Work

In this work, we propose a new version of kernel ridge regression model based on the Beta-noise (

B N

-

K R R

) to predict the uncertainty system of Beta-noise. Novel results have been obtained by the use of the model

B N

-

K R R

, which takes the Bayesian principle to Beta-noise empirical risk loss and improves the prediction accuracy. The numerical experiments are carried out on real-world data (the short-term wind speed). Comparing the model

B N

-

K R R

and models

G N

-

K R R

and

ν

-

S V R

by criteria

M A E

,

M A P E

,

R M S E

, and

S E P

verifies the validity and feasibility of our proposed model

B N - K R R

. Further, the forecasting results indicate that the proposed technique can obtain good performance on short-term wind speed forecasting.

In practical regression problems, data uncertainty is inevitable. The observed data are usually described in linguistic levels or ambiguous metrics, like the weather forecast, the forecast results of dry and wet, or sunny and cloudy, and so on. We should consider developing fuzzy kernel ridge regression algorithms with different noise models.

We verify the validity and feasibility of the model.

Author Contributions

L.S. and C.L. conceived the algorithm and designed the experiments; T.Z. implemented the experiments; T.Z. analyzed the results; S.Z. drafted the manuscript. All authors read and revised the manuscript.

Funding

This work was supported by the Natural Science Foundation Project of Henan (No.182300410130, 182300410368, 162300410177), and Key Project of Science and Technology Department of Henan Province (Nos.17A520038, 142102210056, 17A520040), Ph.D. Research Startup Foundation of Henan Normal University (Nos.qd15129, qd15130, qd15132), National natural science foundation of China (NSFC) (No.61772176, 61402153, 11702087), Project Funded By China Postdoctoral Science Foundation (No.2016M602247), Key Scientific and Technological Project of Xin Xiang City Of China (No.CP150120160714034529271, CXGG17002).

Conflicts of Interest

The authors declare no conflict of interest.

References

Hoerl, A.E. Application of ridge analysis to regression problems. Chem. Eng. Prog. 1962, 58, 54–59. [Google Scholar]
Hastie, T.; Tibshirani, R.; Friedman, J.H. The Elements of Statistical Learning; Springer: Berlin/Heidelberg, Germany, 2001. [Google Scholar]
Hoerl, A.E.; Kennard, R.W. Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer: New York, NY, USA, 2009. [Google Scholar]
Tikhonov, A.A.; Arsenin, V.Y. Solutions of Ill-Posed Problems; Wiley: New York, NY, USA, 1977. [Google Scholar]
Geman, S.; Bienenstock, E.; Doursat, R. Neural networks and the bias/variance dilemma. Neural Comput. 1992, 4, 1–58. [Google Scholar] [CrossRef]
Saunders, C.; Gammerman, A.; Vovk, V. Ridge Regression Learning Algorithm in Dual Variables. In Proceedings of the 15th International Conference on Machine Learning, Madison, WI, USA, 24–27 July 1998; Morgan Kaufmann: San Francisco, CA, USA, 1998; pp. 515–521. [Google Scholar]
Suykens, J.A.K.; Lukas, L.; Vandewalle, J. Sparse Approximation using Least-squares Support Vector Machines. IEEE Int. Symp. Circuits Syst. Geneva 2000, 2, 757–760. [Google Scholar]
Gavin, C.C.; Nicola, L.C.T. Reduced rank kernel ridge regression. Neural Process. Lett. 2002, 16, 293–302. [Google Scholar]
Zhang, Z.H.; Dai, G.; Xu, C.F. Regularized Discriminant Analysis, Ridge Regression and Beyond. J. Mach. Learn. Res. 2010, 11, 2199–2228. [Google Scholar]
Orsenigo, C.; Vercellis, C. Kernel ridge regression for out-of-sample mapping in supervised manifold learning. Expert Syst. Appl. 2012, 39, 7757–7762. [Google Scholar] [CrossRef]
Lange, M. On the uncertainty of wind power predictions-Analysis of the forecast accuracy and statistical distribution of errors. J. Sol. Energy Eng. 2005, 127, 177–184. [Google Scholar] [CrossRef]
Bofinger, S.; Luig, A.; Beyer, H.G. Qualification of wind power forecasts. In Proceedings of the 2002 Global Wind Power Conference, Paris, France, 2 April 2002. [Google Scholar]
Fabbri, A.; Roman, T.G.S.; Abbad, J.R.; Quezada, V.H.M. Assessment of the cost associated with wind generation prediction errors in a liberalized electricity market. IEEE Trans. Power Syst. 2005, 20, 1440–1446. [Google Scholar] [CrossRef]
Bludszuweit, H.; Antonio, J.; Llombart, A. Statistical analysis of wind power forecast error. IEEE Trans. Power Syst. 2008, 23, 983–991. [Google Scholar] [CrossRef]
Madhiarasan, M.; Deepa, S.N. A novel criterion to select hidden neuron numbers in improved back propagation networks for wind speed forecasting. Appl. Intell. 2016, 44, 878–893. [Google Scholar] [CrossRef]
Canavos, G.C. Applied Probability and Statistical Methods; Little, Brown and Company: Toronto, ON, Canada, 1984. [Google Scholar]
Vapnik, V.N. The Nature of Statistical Learning Theory; Springer: New York, NY, USA, 1995. [Google Scholar]
Smola, A.; Scholkopf, B. A tutorial on support vector regression. Stat. Comput. 2004, 14, 199–222. [Google Scholar] [CrossRef] [Green Version]
Wei, C.; Keerthi, S.S.; Chong, J.O. Bayesian support vector regression using a unified cost function. IEEE Trans. Neural Netw. 2004, 15, 29–44. [Google Scholar]
Girosi, F. Models of Noise and Robust Estimates; A.I. Memo No. 1287; Massachusetts Institute of Technology, Artificial Intelligence Laboratory: Cambridge, MA, USA, 1991. [Google Scholar]
Pontil, M.; Mukherjee, S.; Girosi, F. On the Noise Model of Support Vector Machines Regression. In Proceedings of the 11th International Conference on Algorithmic Learning Theory, Sydney, NSW, Australia, 11–13 December 2000; pp. 316–324. [Google Scholar]
Goldberg, D.E.; Holland, J.H. Genetic algorithm and machine learning. Mach. Learn. 1988, 3, 95–99. [Google Scholar] [CrossRef]
Eren, B.; Vedide Rezan, U.; Ufuk, Y.; Egrioglu, E. A modified genetic algorithm for forecasting fuzzy time series. Appl. Intell. 2014, 41, 453–463. [Google Scholar]
Wei, H.; Tang, X.S.; Liu, H. A genetic algorithm(GA)-based method for the combinatorial optimization in contour formation. Appl. Intell. 2015, 43, 112–131. [Google Scholar] [CrossRef]
Zojaji, Z. Semantic schema theory for genetic programming. Appl. Intell. 2016, 44, 67–87. [Google Scholar] [CrossRef]
Shi, K.S.; Li, L.M. High performance genetic algorithm based text clustering using parts of speech and outlier elimination. Appl. Intell. 2013, 38, 511–519. [Google Scholar] [CrossRef]
Trivedi, A.; Srinivasan, D.; Biswas, S.; Reindl, T. A genetic algorithm—Differential evolution based hybrid framework: Case study on unit commitment scheduling problem. Inf. Sci. 2016, 354, 275–300. [Google Scholar] [CrossRef]
Wu, Q.; Law, R. Fuzzy support vector regression machine with penalizing Gaussian noises on triangular fuzzy number space. Expert Syst. Appl. 2010, 37, 7788–7795. [Google Scholar] [CrossRef]
Gajowniczek, K.; Zabkowski, T. Simulation study on clustering approaches for short-term electricity forecasting. Complexity 2018, 2018, 3683969. [Google Scholar] [CrossRef]
Massidda, L.; Marrocu, M. Smart meter forecasting from one minute to one year horizons. Energies 2018, 11, 3520. [Google Scholar] [CrossRef]
Petković, D.; Shamshirband, S.; Saboohi, H.; Ang, T.F.; Anuar, N.B.; Pavlović, N.D. Support vector regression methodology for prediction of input displacement of adaptive compliant robotic gripper. Appl. Intell. 2014, 41, 887–896. [Google Scholar]
Chang, C.C.; Lin, C.J. Training v-Support Vector Regression: Theory and Algorithms. Neural Comput. 2002, 14, 1959–1977. [Google Scholar] [CrossRef]
Chalimourda, A.; Schölkopf, B.; Smola, A.J. Experimentally optimal ν in support vector regression for different noise models and parameter settings. Neural Netw. 2004, 17, 127–141. [Google Scholar] [CrossRef]

Figure 1. Beta pdf and Gauss pdf.

Figure 2. Empirical risk loss of Gauss-noise and Beta-noise.

Figure 3. The forecasting results of

ν

-

S V R

,

G N

-

K R R

,

B N

-

K R R

(step = 1).

Figure 3. The forecasting results of

ν

-

S V R

,

G N

-

K R R

,

B N

-

K R R

(step = 1).

Figure 4. The forecasting results of

ν

-

S V R

,

G N

-

K R R

,

B N

-

K R R

(step = 3).

Figure 4. The forecasting results of

ν

-

S V R

,

G N

-

K R R

,

B N

-

K R R

(step = 3).

Table 1. Error statistic of three models (step = 1).

Model	MAE	RMSE	MAPE(%)	SEP(%)
$ν$ -SVR	0.4280	0.5833	7.02	7.02
GN-KRR	0.4219	0.5768	7.94	7.06
BN-KRR	0.3668	0.4233	6.84	5.23

Table 2. Error statistic of three models (step = 3).

Model	MAE	RMSE	MAPE(%)	SEP(%)
$ν$ -SVR	0.7979	1.0116	23.36	12.53
GN-KRR	0.7109	0.9226	17.17	11.43
BN-KRR	0.6640	0.8417	18.82	10.43

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, S.; Zhou, T.; Sun, L.; Liu, C. Kernel Ridge Regression Model Based on Beta-Noise and Its Application in Short-Term Wind Speed Forecasting. Symmetry 2019, 11, 282. https://doi.org/10.3390/sym11020282

AMA Style

Zhang S, Zhou T, Sun L, Liu C. Kernel Ridge Regression Model Based on Beta-Noise and Its Application in Short-Term Wind Speed Forecasting. Symmetry. 2019; 11(2):282. https://doi.org/10.3390/sym11020282

Chicago/Turabian Style

Zhang, Shiguang, Ting Zhou, Lin Sun, and Chao Liu. 2019. "Kernel Ridge Regression Model Based on Beta-Noise and Its Application in Short-Term Wind Speed Forecasting" Symmetry 11, no. 2: 282. https://doi.org/10.3390/sym11020282

APA Style

Zhang, S., Zhou, T., Sun, L., & Liu, C. (2019). Kernel Ridge Regression Model Based on Beta-Noise and Its Application in Short-Term Wind Speed Forecasting. Symmetry, 11(2), 282. https://doi.org/10.3390/sym11020282

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Kernel Ridge Regression Model Based on Beta-Noise and Its Application in Short-Term Wind Speed Forecasting

Abstract

1. Introduction

2. Bayesian Principle to Beta-Noise Empirical Risk Loss

3. $KRR$ Model Based on Beta-Noise

4. Solution Based on Genetic Algorithm

5. Short-Term Wind Speed and Wind Power Forecasting with Real Data-Set

6. Conclusions and Future Work

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Kernel Ridge Regression Model Based on Beta-Noise and Its Application in Short-Term Wind Speed Forecasting

Abstract

1. Introduction

2. Bayesian Principle to Beta-Noise Empirical Risk Loss

3. KRR Model Based on Beta-Noise

4. Solution Based on Genetic Algorithm

5. Short-Term Wind Speed and Wind Power Forecasting with Real Data-Set

6. Conclusions and Future Work

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3. $KRR$ Model Based on Beta-Noise