1. Introduction
Single hidden-layer feedforward neural networks (SLFN) can approximate any function and form decision boundaries with arbitrary shapes if the activation function is chosen properly [
1,
2,
3]. To fast train SLFN, Huang et al. proposed a learning algorithm called “Extreme Learning Machine” (ELM), which randomly assigns the hidden nodes parameters and then determines the output weights by the Moore–Penrose generalized inverse [
4,
5,
6]. ELM has been successfully applied to many real-world applications, such as retinal vessel segmentation [
7], wind speed forecasting [
8,
9], water network management [
10], path-tracking of autonomous mobile robot [
11], modelling of drying processes [
12], bearing fault diagnosis [
13], cybersecurity defense framework [
14], crop classification [
15], and energy disaggregation [
16]. In recent years, ELM has been extended to multilayer ELMs, which play an important role in the deep learning domain [
17,
18,
19,
20,
21,
22,
23].
The original ELM is a batch learning algorithm; all samples must be available before ELM trains SLFN. Whenever new data arrive, ELM has to gather old and new data together to retrain SLFN to incorporate the new information. This is a very time-consuming process, and is even computationally infeasible in some applications where frequent and fast training, or even real-time training, is required. Hardware systems cannot provide enough memory to store the increasing amount of training data. To deal with problems with sequential data, Liang et al. proposed an online sequential ELM (OS-ELM) to learn data one-by-one or chunk-by-chunk with fixed or varying chunk size [
24]. OS-ELM can be implemented in common programming languages and run on universal computing platforms. Moreover, in order to fast execute OS-ELM, Frances-Villora et al. developed an FPGA-based implementation of the tailored OS-ELM algorithm [
25], which assumes a one-by-one training strategy. OS-ELM has been successfully adopted in some applications, but it still has some drawbacks. Firstly, OS-ELM may encounter ill-conditioning problems, and resulting fluctuating generalization performances of SLFN if the number of hidden nodes L in SLFN is not set appropriately [
26,
27,
28]. Secondly, OS-ELM does not take timeliness of samples into account, so it cannot be directly employed in time-varying or nonstationary environments.
As a variant of ELM, Regularized ELM (RELM) [
29,
30], which is equivalent to the constrained optimization-based ELM [
31] mathematically, can achieve better generalization performance than ELM, can greatly reduce the randomness effect in ELM [
32,
33], and is less sensitive to L. Furthermore, several online sequential RELMs have been developed by researchers. Huynh and Won proposed ReOS-ELM [
26]. Despite the widespread application of ReOS-ELM, it does not consider the timeliness of samples. To take this into account, Zhang et al. and Du et al. separately designed online sequential RELM with a forgetting factor, viz., SF-ELM [
34] and RFOS-ELM [
35]; Guo and Xu referred to them as FR-OSELM [
27,
28]. After stating the real optimization cost function in FR-OSELM and theoretically analyzing FR-OSELM, Guo and Xu pointed out that the regularization term in the cost function of FR-OSELM will be forgotten and tends to zero as time passes; thus, FR-OSELM will probably run into ill-conditioning problems and become unstable after a long period. Incidentally, a similar or the same optimization cost function, or recursive solution in which the regularization term wanes gradually with time, is still utilized in [
36,
37,
38].
Recently, online sequential extreme learning machines with persistent regularization and forgetting factors (OSELM-PRFFs) were put forward [
27,
28,
39]; this can avoid the potential singularity or ill-posed problem of FR-OSELM. Moreover, two kinds of recursive calculation schemes for OSELM-PRFF have been developed. One is FP-ELM, which directly calculates precise inverse of matrix during every update of model [
39]; the other includes FGR-OSELM [
27] and AFGR-OSELM [
28] which compute the recursively approximate inverse of the involved matrix to reduce computational burden. These online sequential RELM have been applied to some examples perfectly. However, although the recursive calculation of approximate inverse matrix in FGR-OSELM and AFGR-OSELM enhances efficiency, it may cause FGR-OSELM and AFGR-OSELM to be unreliable in certain paradigms or parameters setups. Additionally, the direct calculation of precise inverse matrix in FP-ELM will make FP-ELM inefficient.
The reliability and time efficiency of online learning algorithms are two important indexes in general. In real-time applications, such as stock forecasts, modelling for controlled objects and signal processing, the computational efficiency of online training algorithm for SLFN is a crucial factor. Here, a new online sequential extreme learning machine with persistent regularization and forgetting factor using Cholesky factorization (CF-OSELM-PRFF) is presented. This paper analyzes and proves the symmetry and positive definitiveness of coefficient matrix of linear equations of OSELM-PRFF. The presented method decomposes the coefficient matrix in Cholesky form during every updating model period, and transforms the linear equations into two linear equations with lower and upper triangular coefficient matrices respectively, and applies forward substitution and backward substitution to solve the two linear equations. The computational efficiency and prediction accuracy of CF-OSELM-PRFF are evaluated by process identification, classical time series prediction, and real electricity load forecasting. The numerical experiments indicate that CF-OSELM-PRFF runs faster than several other representative methods, and can provide accurate predictions.
The rest of this paper is organized as follows.
Section 2 gives a brief review of RELM, FR-OSELM and the existing OSELM-PRFF.
Section 3 proposes CF-OSELM-PRFF. Performance evaluation is conducted in
Section 4. Finally, conclusions are given in
Section 5.
2. Brief Review of Related Work
2.1. The RELM
For simplicity, ELM based learning algorithm for SLFN with multiple input single output is discussed. The output of a SLFN with L hidden nodes can be represented by
where
ai and b
i are the learning parameters of hidden nodes, and
β = [
β1,
β2,…,
βL]
T is the vector of the output weights, and
G(
ai, b
i,
x) denotes the output of the
i-th hidden node with respect to the input
x, i.e., activation function.
h(
x) = [
G(
a1, b
1,
x),
G(
a2, b
2,
x),…,
G(
aL, b
L,
x)] is a feature mapping from the n-dimensional input space to the L-dimensional hidden-layer feature space. In ELM,
ai and b
i are randomly determined firstly.
For a given set of distinct training data
, where
xi is an n-dimensional input vector and
yi is the corresponding scalar observation, The RELM, may be formulated as [
29,
30]
where
Y = [
y1,
y2,…,
yN]
T indicates the target value of all the samples.
H=[
h(
x1)
T,
h(
x2)
T,…,
h(
xN)
T]
T is the mapping matrix for the inputs of all the samples.
λ is the regularization parameter.
2.2. FR-OSELM
For time
k, the FR-OSELM algorithm is equivalent to minimizing the following cost function:
where subscripts
i,
j,
k represent time point, and
μ is forgetting factor.
The partial derivative of the objective function with respect to
βk is
Set
, then
βk can be obtained as follows:
Invert both sides of Equation (7) and apply Sherman-Morrison-Woodbury formula,
Pk can be calculated as follows:
Substituting Equation (6) into Equation (5) yields
It is obvious that the regularization item
in the cost function of FR-OSELM will be forgotten and tends to zero with
k increasing, thus, FR-OSELM will probably run into ill-conditioning problems and become unstable after a long period [
32,
33].
2.3. The Existing Algorithms for OSELM-PRFF
Recently, some papers [
27,
28,
39] take the following cost function in online sequential RELM with forgetting factor:
Set
, then
βk can be obtained as follows:
Here regularization item will not be forgotten over time. Moreover, two kinds of recursive calculation approaches for βk, i.e., FP-ELM and FGR-OSELM have been proposed.
2.3.1. FP-ELM
The main procedure of FP-ELM can be retold as follows.
For an initial data chunk
, let
according to Equation (11), the initial network output weight is
When a new data chunk
, (
k ≥ 1) arrives, the recursive way of updating the output weights can be written as
In Equation (15), the calculation of inverse of matrix, i.e., (λI+Kk)-1 make FP-ELM time-consuming.
2.3.2. FGR-OSELM
The procedure of FGR-OSELM can be summarized as follows.
Online sequential learning phase:
In Equation (18), the approximate calculation of matrix make FGR-OSELM unreliable in certain paradigms or parameters setups.
It should be noted that some online sequential RELM with forgetting factor, such as SF-ELM [
34], RFOS-ELM [
35], WOS-ELM [
36], DU-OS-ELM [
37], and FReOS-ELM [
38], take Equation (10) as cost function, but take Equations (8) and (9) or their equivalent form as recursive solutions.
3. Proposed CF-OSELM-PRFF
FP-ELM is a stable online sequential training algorithm for SLFN which take the timeliness of samples into consideration and can circumvent the potential phenomenon of data saturation. However, the calculation of inverse of λI+Kk in Equation (15) is time-consuming in every work period. FGR-OSELM calculates approximate recursively by Equations (18) and (19), which can save time, whereas which might engender algorithm unstable. In order to speed up FP-ELM, the work proposes an approach to fast solve βk using Cholesky decomposition trick. The complete algorithm is termed as CF-OSELM-PRFF, which is described in the sequel.
Then, Equation (11) can be rewritten as
Proposition 1. The matrix λI+Kk is a symmetric positive definition matrix.
Proof
Symmetry. Apparently,
K0 is symmetric. Assume
Kk-1 is symmetric, then
According to mathematical induction, for any k, Kk is symmetric. As a result, λI+Kk is symmetric.
Positive definitiveness. For any
ζ=[
ζ1,
ζ2, …,
ζL]
T≠0, it holds that
Suppose
Kk-1 is positive semi-definite, that is,
then,
Similar to Equation (25), it holds that
In conclusion, λI+Kk is a symmetric positive definition matrix.
Denote
B=
λI+
Kk, then, B can be uniquely factorized into Cholesky form, i.e.,
B=
UTU, where
U is an upper triangular matrix. Then
U can be calculated as following formulas [
40]:
Equation (23) can be solved by the following two equations:
Denote
Qk= [
q1,
q2, …,
qL]
T, utilize back substitution method, the solution to Equation (23), viz., the coefficient
βi in Equation (1) may be gained as follows
The CF-OSELM-PRFF algorithm can be summarized as follows.
Step 1: Preparation.
(1) Choose the hidden output function G(a, b, x) of SFLN and the number of hidden nodes L; determine λ, μ.
(2) Randomly assign hidden parameters (ai, bi), i=1,2,…,L.
Step 2: Initialization.
(1) Acquire the initial data chunk S0 (N0 ≥ 2).
(2) Calculate H0, Y0.
(3) Calculate K0 by Equation (12), calculate Q0 by Equation (21).
(4) Calculate Cholesky factor U of K0 by Equations (31) and (32).
(5) Calculate β0 by Equations (35) and (36).
(6) For the input x, the predicting output value .
Step 3: Online modeling and prediction, i.e., repeat the following substeps.
(1) Acquire the kth (k ≥ 1) data chunk Sk
(2) Calculate Hk, Yk.
(3) Calculate Kk by Equation (14), calculate Qk by Equation (22).
(4) Calculate U of Kk by Equations (31) and (32).
(5) Calculate βk by Equations (35) and (36).
(6) For the input x, the predicting output value .
4. Experimental Results and Analysis
In this section, the performance of the presented CF-OSELM-PRFF is verified by a time-varying nonlinear process identification, two chaotic time series and one electricity demand prediction. These simulations are designed from the aspects of computation complexity (or running time) and accuracy of the CF-OSELM-PRFF by comparison with the FP-ELM, FGR-OSELM, AFGR-OSELM and FOS-MELM [
21]. FOS-MELM is an online sequential multiple hidden layers extreme learning machine with forgetting mechanism, which is recently proposed by Xiao et al. To make the results of FOS-MELM more stable, a regularization term is introduced into its solving process according to [
20].
For these online algorithms, the common regularization parameter
λ is set as 0.001; the forgetting factor is set as
μ=0.99. For AFGR-OSELM, adaptive forgetting factor is tuned in interval [0.8, 0.999] with initial value 0.995, and other peculiar parameters are set according to [
28].
The output of hidden node with respect to the input x of a SLFN in Equation (1) is set as the sigmoid function, i.e., G(a, b, x)=1/(1+exp(-(ax +b))), the components of a, i.e, the input weights and biase b are randomly chosen from the range [-1,1]. Specially, the hyperbolic tangent function G(a, b, x)=(1-exp(-(ax+b)))/(1+exp(-(ax +b))) is selected as activation function in FOS-MELM.
For FOS-MELM, every training data chunk contains only a sample, and each sample remains valid for s unit time; the parameter s is set as s = N0.
In order to observe performance of these approaches under various situations, the number L of hidden nodes of SLFN is set as 25, 50, 100, 150, 200, and the corresponding number N0 of initial training samples is assigned to 50, 100, 200, 200, 200, respectively.
The root mean square error (RMSE) of prediction is regarded as measurement index of model accuracy.
The relative efficiencies of CF-OSELM-PRFF to its counterparts are measured by speedup ratios. The speedups of CF-OSELM-PRFF to the other related methods are defined as
All the performance assessments were carried out in MATLAB R2010b 32-bit environment running on Windows 7 32-bit with Intel Core i3-3220 3.3 GHz CPU and 4 GB RAM.
4.1. Time-varying Nonlinear Process Identification
The identified unknown system is a modified version of the one addressed in [
41]; by changing the constant and the coefficients of variables, the time-varying system is expressed as follows:
The system (42) can be expressed as follows:
where
f(
x) is a nonlinear function, x(k) is the regression input data vector
with
ny,
nd and
nu being model structure parameters. They are set as
ny = 3,
nd = 1 and
nu = 2 here. When SLFN is applied to approximate Equation (40), (
x(
k),
y(
k)) is the learning sample (
xk,
yk) of SLFN.
Denote
k0 = N
0+max (
ny,
nu) −
nd,
k1 =
k0 + 500. The system input is set as follows:
where, rand ( ) generates random numbers which are uniformly distributed in the interval (0,1).
The simulations are carried out for different numbers of steps and hidden nodes. Efficiency comparison between CF-OSELM-PRFF and FP-ELM, FGR-OSELM, AFGR-OSELM, together with FOS-MELM are listed in
Table 1. Due to randomness of parameters
a, b, and
u(
k) during initial stage (
k≤
k0), along with intrinsic uncertainty of computing environment, the testing results of simulation must possess variation. Consequentially, for each case, every running time is a mean over 5 independent trials. In each set of experiments with the same number of simulation steps and nodes, the best result is marked in bold.
Table 1 shows that, with the same number of hidden nodes and by performing the same number of simulation prediction steps, CF-OSELM-PRFF costs least time among five approaches statistically, therefore, CF-OSELM-PRFF has obvious speed advantage over FP-ELM. Moreover, with the increase of the number of hidden nodes, the speedup tends to become larger. FOS-MELM trains three hidden layer feedforward neural networks and consists of complex calculation steps; thus, it costs most time.
Table 2 displays a prediction RMSE comparison of CF-OSELM-PRFF to its counterparts. Every RMSE is also an average over 5 independent trials of each algorithm performing the set steps. In each set of results, the best one is marked in bold. From
Table 2, it can be seen that there is not any apparent difference in the predictive accuracy among FP-ELM, FOS-MELM and CF-OSELM-PRFF; in some setups, FGR-OSELM and AFGR-OSELM can provide satisfactory, or even the best forecasts, but they cannot work optimally in certain cases. When the number of simulation prediction steps is set to 2500 or 3000, FGR-OSELM sometimes produces bad or low-accuracy predictions in the later stages of the simulation process, and therefore, the RMSE of FGR-OSELM becomes very large. Additionally, when the number of hidden nodes is set at 100, 150 or 200 and the number of simulation prediction steps is set at 1000 or more, AFGR-OSELM usually runs unstably and produces a very large RMSE. The reason for this is that in the recursive formulas of FGR-OSELM and AFGR-OSELM, the approximate calculation of inverse of a related matrix will yield errors which may propagate and accumulate; as a result, the algorithm is apt to producing unreliable results.
To intuitively observe and compare the accuracy and stability of these online sequential RELMs, fix L = 25, N
0 = 50, and simulation prediction steps = 3000, execute FGR-OSELM repeatedly until a certain unstable predictive scenario occurs, plot the corresponding prediction error (predicted value minus real value) curves in
Figure 1a, save the current
a, b values, and initial
u(
k) (
k≤
k0) signal, which are called as adverse a, b values and initial
u(
k) signal. Subsequently, execute FP-ELM, CF-OSELM-PRFF and FOS-MELM with the adverse
a, b values and initial
u(
k) signal, respectively, and demonstrate the prediction error curves of the three approaches in
Figure 1b, c and d. Clearly,
Figure 1a shows that the prediction errors of FGR-OSELM become very large and completely meaningless when the prediction step exceeds a certain limit. There are a few large peaks at certain instances which reveal the instability of the FGR-OSELM arising from its recursive approximate calculation. In order to explicitly exhibit variation of prediction error of FGR-OSELM, only the partial results within the first 2453 steps are presented. Additionally,
Figure 1b, c and d shown that the prediction effect of CF-OSELM-PRFF is similar to that of FP-ELM and FOS-MELM; in other words, they possess the same ideal predictive performance.
Set L = 100, N
0 = 200, and simulation prediction steps = 3000, execute AFGR-OSELM repeatedly until a certain unstable predictive scenario occurs, plot the corresponding prediction error curves in
Figure 2a, save the current
a, b values, and initial
u(
k) (
k≤
k0) signal, i.e., adverse
a, b values and initial
u(
k) signal. Then, execute FP-ELM, CF-OSELM-PRFF and FOS-MELM with the adverse
a, b values, and initial
u(
k) signal, respectively, and demonstrate the prediction error curves of the three approaches in
Figure 2b, c and d. Clearly,
Figure 2a shows that, the prediction errors of AFGR-OSELM become larger at the 872th step, reaching −67.7839. Actually, at the 878th step, the prediction error of AFGR-OSELM suddenly reaches −25286.1257. Too large prediction errors have not been marked in
Figure 2a; thus, only the partial results prior to the 878th step are presented. Additionally,
Figure 2b and c manifest CF-OSELM-PRFF possesses the same excellent predictive performance as FP-ELM.
Figure 2d seems to show that FOS-MELM is slightly better than CF-OSELM-PRFF, but there is only a small difference between their RMSEs.
The above experiments indicate that for the parameter settings with which FGR-OSELM or AFGR-OSELM produces larger prediction errors, CF-OSELM-PRFF, FP-ELM and FOS-MELM can run stably and provide satisfactory prediction.
4.2. Lorenz Time Series Prediction
The Lorenz time series is a three-dimensional dynamical system that exhibits chaotic flow, which is described by the following equations [
42,
43]:
where
x(
t),
y(
t), and
z(
t) are the values of time series at time
t. A typical choice for the parameter values is
σ = 10,
r = 28, and
b = 8/3.
In this example, observations of the continuous x(t), y(t), and z(t) at a sequence of sampling time points are concerned. Hence, the function ode45, a standard solver for ordinary differential equations in MATLAB, is applied to generate sampling values of x(t), y(t), and z(t). The routine implements the fourth-order and fifth-order Runge-Kutta method with adaptive integral step size for efficient computation. The initial state is set as x(0) = 2, y(0) = 3, and z(0) = 4.
Let Ts indicates sampling period. At each sampling time point kTs, ([x(kTs-Ts), x(kTs-2Ts),..., x(kTs-nyTs)], x(kTs)) is a training sample for ELM. In time series one step ahead prediction, the common way is to use xk+1 = [x(kTs), x(kTs-Ts), ..., x(kTs-(ny-1)Ts)] to calculate predicting value of x(kTs+Ts), namely, . In the simulation, the sampling period is set as Ts = 0.02, the embedding dimension is chosen as ny = 3, and the x-coordinate of the Lorenz time series is considered for prediction.
In order to verify the computational efficiency of CF-OSELM-PRFF compared to FP-ELM, FGR-OSELM, AFGR-OSELM and FOS-MELM, the running times of these algorithms and speedups of CF-OSELM-PRFF to the other algorithms are tabulated in
Table 3. Every running time is a mean over 5 independent trials. In each set of experiments with the same number of simulation steps and nodes, the best result is formatted in bold. As seen in
Table 3, CF-OSELM-PRFF clearly outperforms the other algorithms in terms of speed.
Table 4 shows the prediction RMSE of five methods. Every RMSE is also an average value over 5 independent trials. In each set of results, the best one is marked in bold. As shown in
Table 4, on the whole, the prediction behaviors of these methods in this simulation are basically similar to those in the first simulation. When the number of simulation prediction steps is set to 3000, FGR-OSELM occasionally produces poor predictions in the later stage of the simulation process; thus, RMSE of FGR-OSELM becomes larger. Unexpectedly, in many cases, AFGR-OSELM cannot provide reasonable predictions.
Analogously, an intuitional comparison of predicting results for these algorithms is also made. Fix L = 200, N
0 = 200, and simulation prediction steps = 3000, run FGR-OSELM repetitively until a certain unstable predictive scenario appears, illustrate the corresponding prediction error curves in
Figure 3a, save the current
a, b values, i.e., adverse
a, b values. Subsequently, execute AFGR-OSELM, FP-ELM, CF-OSELM-PRFF and FOS-MELM with the adverse
a, b values, respectively, their prediction error curves are demonstrated in
Figure 3b, c, d and e.
Figure 3a shows that the prediction errors of FGR-OSELM become very large when the prediction step exceeds a certain limit. In
Figure 3b, only curve from the 450th step to the 540th step is plotted, because after the 540th step, AFGR-OSELM provides too large prediction errors at many time points. The recursive approximate calculations of FGR-OSELM and AFGR-OSELM result in instability in certain settings.
Figure 3c and d manifest prediction error curve of CF-OSELM-PRFF is extremely similar to that of FP-ELM. Comparing
Figure 3d and e, it can be found that the prediction result of CF-OSELM-PRFF is slightly better than that of FOS-MELM.
Although normalizing time series values, i.e., input data of SLFN into the interval [0,1] or [−1,1], can significantly improve stability of FGR-OSELM and AFGR-OSELM, it is difficult to obtain the maximum and minimum values of the input data in some actual online modeling. Thus, the normalization of input data is infeasible sometimes. In this example, CF-OSELM-PRFF and FP-ELM can train SLFN and provide comfortable results using raw data; they are less susceptible to scope of input data than FGR-OSELM and AFGR-OSELM.
4.3. Rössler Time Series Prediction
The Rössler system is one of the most famous chaotic systems, though it is an artificial system designed solely with the purpose of creating a model for a strange attractor [
44,
45,
46]. The Rössler time series is generated from the following differential equations:
where
x(
t),
y(
t), and
z(
t) are the values of time series at time t. d, e, f are the control parameters and they are set as
d = 0.15,
e = 0.2 and
f = 10.
The way to generate sampling values of x(t), y(t), and z(t) is the same as that aforementioned in the previous example. The initial condition is set as x(0) = 0.05, y(0) = 0.05, and z(0) = 0.05. The sampling interval is set as Ts=0.02, the embedding dimension is chosen as ny=3, and the x-coordinate of the the Rössler systems is considered for prediction.
In this simulation, the experimental design is the same as that in the previous simulations. The running times of these algorithms and speedups of CF-OSELM-PRFF to other algorithms are recorded in
Table 5. Every running time is an average over 5 independent trials. In each set of experiments with the same number of simulation steps and nodes, the best result is formatted in bold. From
Table 5, it is clear to see that CF-OSELM-PRFF is superior to other methods in terms of efficiency.
Table 6 shows the prediction RMSE of five methods. Every RMSE is an average over 5 independent trials. In each set of results, the best one is marked in bold. Different from the previous two experiments, here the chance that FGR-OSELM behaves unstably in the later segment of the simulation process when the numbers of simulation prediction steps are set as 3000. Moreover, AFGR-OSELM can behave stably with larger probability. If AFGR-OSELM is run for only 5 successive times, unreasonable prediction results rarely appear. Thus, its performance has not been investigated in the following contrastive demonstration. Additionally, in many cases, FOS-MELM achieves the best results; with the number of nodes increasing, FOS-MELM yields higher accuracy than CF-OSELM-PRFF at the expense of requiring more time.
Accordingly, for the case of L = 25, N
0 = 50 and simulation prediction steps = 3000, FGR-OSELM, CF-OSELM-PRFF, FP-ELM and FOS-MELM are conducted with the same adverse a, b values, respectively, and their prediction error curves are plotted in
Figure 4.
Figure 4a shows that the FGR-OSELM works first but fails afterwards.
Figure 4b and c manifest CF-OSELM-PRFF and FP-ELM provide almost the same good prediction results. Contrast
Figure 4c and d, it can be found that error range of CF-OSELM-PRFF is smaller than that of FOS-MELM; in other words, the former yields better forecasting than the latter in the late stage.
4.4. Experiment on Real Data Set
Electricity load forecasting plays an important part in the strategy management of electricity power systems. Here, an electricity demand time series (EDTS) [
47] is utilized to test the performances of these online algorithms; the EDTS consists of a sequence of 15 minutes averaged values of power demand. The first 3500 values of EDTS are shown in
Figure 5.
Before training the model, the data are normalized into [−1,1] by Equation (48); after forecasting, the predicted values are denormalized by Equation (49).
In this example, the experiment is designed like the previous ones. The running times of these algorithms and speedups of its counterparts are recorded in
Table 7. In each set of experiments with the same number of simulation steps and nodes, the best result is formatted in bold. From
Table 7, it is clear that CF-OSELM-PRFF runs faster than the other methods.
Table 8 shows prediction RMSE of five methods. In each set of results, the best one is marked in bold. CF-OSELM-PRFF produces nearly the same levels of accuracy as FP-ELM, but higher accuracy than FOS-MELM statistically. FGR-OSELM runs unstably only in one case, but AFGR-OSELM does so in many cases. In addition, different from the previous example, FOS-MELM did not achieve the best results here; CF-OSELM-PRFF yields higher accuracy than FOS-MELM in many cases.
4.5. Discussion
The above experiments show that CF-OSELM-PRFF has greater time efficiency than several other related approaches; the speedup ratio of CF-OSELM-PRFF to other approaches is mainly influenced by the number of hidden nodes of SLFN. The CF-OSELM-PRFF has achieved around 1.4 to 2.0 of speedup over the FP-ELM. This speedup can facilitate its use in real applications, and even in real-time applications.
CF-OSELM-PRFF can provide the same predictive accuracy as FP-ELM, and better stability than FGR-OSELM and AFGR-OSELM. Additionally, the experiments also show that there is not an obvious difference between the predictive accuracy of CF-OSELM-PRFF and that of FOS-MELM. In the third simulation, FOS-MELM outperformed CF-OSELM-PRFF statistically, whereas, CF-OSELM-PRFF surpassed FOS-MELM in the next one.
CF-OSELM-PRFF can learn arriving data one-by-one or chunk-by-chunk without the need for storing the training samples accumulated thus far; it is suitable for storage capacity-constrained computing devices.
In the above experiments, CF-OSELM-PRFF adopted a fixed forgetting factor to reduce the contribution of old samples; actually, it may absorb some variable forgetting factor skills, such as that reported in [
39].
5. Conclusions
Regularization plays an important role in RELM, but in the cost function or recursive solution of many online sequential RELMs, the regularization effect will decay gradually over time. Fortunately, FP-ELM, FGR-OSELM and AFGR-OSELM can maintain persistent regularization effect throughout the whole learning process. They share the same cost function but employ different solving processes.
This paper makes full use of symmetry and positive definitiveness of the coefficient matrix of linear equations of OSELM-PRFF, and factorizes the matrix in Cholesky form to solve equations in every prediction step. Finally, a new solving method for OSELM-PRFF, i.e., CF-OSELM-PRFF is developed. The proposed method is a fast and reliable one; it is very appropriate for fast, and even for real-time modeling of time-varying nonlinear systems.
The regularization item in CF-OSELM-PRFF would not decay over time, but constant regularization parameters make CF-OSELM-PRFF deficient in terms of adaptability. Therefore, it would worthwhile to design a high-efficient method to adjust regularization parameter.