Next Article in Journal
Influence of Symmetric and Asymmetric Voids on Mechanical Behaviors of Tunnel Linings: Model Tests and Numerical Simulations
Previous Article in Journal
Symmetric Networks with Geometric Constraints as Models of Visual Illusions
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Cholesky Factorization Based Online Sequential Extreme Learning Machines with Persistent Regularization and Forgetting Factor

School of Computer Science and Engineering, Central South University, Changsha 410083, China
*
Author to whom correspondence should be addressed.
Symmetry 2019, 11(6), 801; https://doi.org/10.3390/sym11060801
Submission received: 18 May 2019 / Revised: 8 June 2019 / Accepted: 10 June 2019 / Published: 17 June 2019

Abstract

:
The online sequential extreme learning machine with persistent regularization and forgetting factor (OSELM-PRFF) can avoid potential singularities or ill-posed problems of online sequential regularized extreme learning machines with forgetting factors (FR-OSELM), and is particularly suitable for modelling in non-stationary environments. However, existing algorithms for OSELM-PRFF are time-consuming or unstable in certain paradigms or parameters setups. This paper presents a novel algorithm for OSELM-PRFF, named “Cholesky factorization based” OSELM-PRFF (CF-OSELM-PRFF), which recurrently constructs an equation for extreme learning machine and efficiently solves the equation via Cholesky factorization during every cycle. CF-OSELM-PRFF deals with timeliness of samples by forgetting factor, and the regularization term in its cost function works persistently. CF-OSELM-PRFF can learn data one-by-one or chunk-by-chunk with a fixed or varying chunk size. Detailed performance comparisons between CF-OSELM-PRFF and relevant approaches are carried out on several regression problems. The numerical simulation results show that CF-OSELM-PRFF demonstrates higher computational efficiency than its counterparts, and can yield stable predictions.

1. Introduction

Single hidden-layer feedforward neural networks (SLFN) can approximate any function and form decision boundaries with arbitrary shapes if the activation function is chosen properly [1,2,3]. To fast train SLFN, Huang et al. proposed a learning algorithm called “Extreme Learning Machine” (ELM), which randomly assigns the hidden nodes parameters and then determines the output weights by the Moore–Penrose generalized inverse [4,5,6]. ELM has been successfully applied to many real-world applications, such as retinal vessel segmentation [7], wind speed forecasting [8,9], water network management [10], path-tracking of autonomous mobile robot [11], modelling of drying processes [12], bearing fault diagnosis [13], cybersecurity defense framework [14], crop classification [15], and energy disaggregation [16]. In recent years, ELM has been extended to multilayer ELMs, which play an important role in the deep learning domain [17,18,19,20,21,22,23].
The original ELM is a batch learning algorithm; all samples must be available before ELM trains SLFN. Whenever new data arrive, ELM has to gather old and new data together to retrain SLFN to incorporate the new information. This is a very time-consuming process, and is even computationally infeasible in some applications where frequent and fast training, or even real-time training, is required. Hardware systems cannot provide enough memory to store the increasing amount of training data. To deal with problems with sequential data, Liang et al. proposed an online sequential ELM (OS-ELM) to learn data one-by-one or chunk-by-chunk with fixed or varying chunk size [24]. OS-ELM can be implemented in common programming languages and run on universal computing platforms. Moreover, in order to fast execute OS-ELM, Frances-Villora et al. developed an FPGA-based implementation of the tailored OS-ELM algorithm [25], which assumes a one-by-one training strategy. OS-ELM has been successfully adopted in some applications, but it still has some drawbacks. Firstly, OS-ELM may encounter ill-conditioning problems, and resulting fluctuating generalization performances of SLFN if the number of hidden nodes L in SLFN is not set appropriately [26,27,28]. Secondly, OS-ELM does not take timeliness of samples into account, so it cannot be directly employed in time-varying or nonstationary environments.
As a variant of ELM, Regularized ELM (RELM) [29,30], which is equivalent to the constrained optimization-based ELM [31] mathematically, can achieve better generalization performance than ELM, can greatly reduce the randomness effect in ELM [32,33], and is less sensitive to L. Furthermore, several online sequential RELMs have been developed by researchers. Huynh and Won proposed ReOS-ELM [26]. Despite the widespread application of ReOS-ELM, it does not consider the timeliness of samples. To take this into account, Zhang et al. and Du et al. separately designed online sequential RELM with a forgetting factor, viz., SF-ELM [34] and RFOS-ELM [35]; Guo and Xu referred to them as FR-OSELM [27,28]. After stating the real optimization cost function in FR-OSELM and theoretically analyzing FR-OSELM, Guo and Xu pointed out that the regularization term in the cost function of FR-OSELM will be forgotten and tends to zero as time passes; thus, FR-OSELM will probably run into ill-conditioning problems and become unstable after a long period. Incidentally, a similar or the same optimization cost function, or recursive solution in which the regularization term wanes gradually with time, is still utilized in [36,37,38].
Recently, online sequential extreme learning machines with persistent regularization and forgetting factors (OSELM-PRFFs) were put forward [27,28,39]; this can avoid the potential singularity or ill-posed problem of FR-OSELM. Moreover, two kinds of recursive calculation schemes for OSELM-PRFF have been developed. One is FP-ELM, which directly calculates precise inverse of matrix during every update of model [39]; the other includes FGR-OSELM [27] and AFGR-OSELM [28] which compute the recursively approximate inverse of the involved matrix to reduce computational burden. These online sequential RELM have been applied to some examples perfectly. However, although the recursive calculation of approximate inverse matrix in FGR-OSELM and AFGR-OSELM enhances efficiency, it may cause FGR-OSELM and AFGR-OSELM to be unreliable in certain paradigms or parameters setups. Additionally, the direct calculation of precise inverse matrix in FP-ELM will make FP-ELM inefficient.
The reliability and time efficiency of online learning algorithms are two important indexes in general. In real-time applications, such as stock forecasts, modelling for controlled objects and signal processing, the computational efficiency of online training algorithm for SLFN is a crucial factor. Here, a new online sequential extreme learning machine with persistent regularization and forgetting factor using Cholesky factorization (CF-OSELM-PRFF) is presented. This paper analyzes and proves the symmetry and positive definitiveness of coefficient matrix of linear equations of OSELM-PRFF. The presented method decomposes the coefficient matrix in Cholesky form during every updating model period, and transforms the linear equations into two linear equations with lower and upper triangular coefficient matrices respectively, and applies forward substitution and backward substitution to solve the two linear equations. The computational efficiency and prediction accuracy of CF-OSELM-PRFF are evaluated by process identification, classical time series prediction, and real electricity load forecasting. The numerical experiments indicate that CF-OSELM-PRFF runs faster than several other representative methods, and can provide accurate predictions.
The rest of this paper is organized as follows. Section 2 gives a brief review of RELM, FR-OSELM and the existing OSELM-PRFF. Section 3 proposes CF-OSELM-PRFF. Performance evaluation is conducted in Section 4. Finally, conclusions are given in Section 5.

2. Brief Review of Related Work

2.1. The RELM

For simplicity, ELM based learning algorithm for SLFN with multiple input single output is discussed. The output of a SLFN with L hidden nodes can be represented by
f ( x ) = i = 1 L β i G ( a i , b i , x ) = h ( x ) β , x R n , a i R n
where ai and bi are the learning parameters of hidden nodes, and β = [β1, β2,…,βL]T is the vector of the output weights, and G(ai, bi, x) denotes the output of the i-th hidden node with respect to the input x, i.e., activation function. h(x) = [G(a1, b1, x), G(a2, b2, x),…, G(aL, bL, x)] is a feature mapping from the n-dimensional input space to the L-dimensional hidden-layer feature space. In ELM, ai and bi are randomly determined firstly.
For a given set of distinct training data { ( x i , y i ) } 1 N R n × R , where xi is an n-dimensional input vector and yi is the corresponding scalar observation, The RELM, may be formulated as [29,30]
Minimize   L R E L M ( β ) = λ 2 | | β | | 2 + 1 2 | | H β Y | | 2 ,
where Y = [y1, y2,…, yN]T indicates the target value of all the samples. H=[h(x1)T, h(x2)T,…, h(xN)T]T is the mapping matrix for the inputs of all the samples. λ is the regularization parameter.

2.2. FR-OSELM

For time k, the FR-OSELM algorithm is equivalent to minimizing the following cost function:
L F R ( β k ) = λ 2 j = 1 k μ j | | β k | | 2 + 1 2 ( j = 1 k μ j | | H 0 β k Y 0 | | 2 + j = 2 k μ j | | H 1 β k Y 1 | | 2 + + μ k | | H k 1 β k Y k 1 | | 2 + | | H k β k Y k | | 2 ) = λ 2 j = 1 k μ j | | β k | | 2 + 1 2 i = 0 k ( j = i + 1 k μ j | | H i β k Y i | | 2 ) .
where subscripts i, j, k represent time point, and μ is forgetting factor.
The partial derivative of the objective function with respect to βk is
L F R ( β k ) β k = λ j = 1 k μ j β k + i = 0 k [ j = i + 1 k μ j H i T ( H i β k Y i ) ] = ( λ j = 1 k μ j I + i = 0 k j = i + 1 k μ j H i T H i ) β k i = 0 k j = i + 1 k μ j H i T Y i
Set L F R ( β k ) / β k = 0 , then βk can be obtained as follows:
β k = ( λ j = 1 k μ j I + i = 0 k j = i + 1 k μ j H i T H i ) 1 ( i = 0 k j = i + 1 k μ j H i T Y i ) .
Denote
P k = ( λ j = 1 k μ j I + i = 0 k j = i + 1 k μ j H i T H i ) 1 ,
then,
P k 1 = λ j = 1 k μ j I + i = 0 k 1 j = i + 1 k μ j H i T H i + H k T H k = μ k ( λ j = 1 k 1 μ j I + i = 0 k 1 j = i + 1 k 1 μ j H i T H i ) + H k T H k = μ k P k 1 1 + H k T H k
Invert both sides of Equation (7) and apply Sherman-Morrison-Woodbury formula, Pk can be calculated as follows:
P k = ( μ k P k 1 1 + H k T H k ) 1 = 1 μ k ( P k 1 P k 1 H k T H k P k 1 μ k + H k P k 1 H k T ) ,
Substituting Equation (6) into Equation (5) yields
β k = P k ( i = 0 k j = i + 1 k u j H i T Y i ) = P k ( μ k i = 0 k 1 j = i + 1 k 1 μ j H i T Y i + H k T Y k ) = P k ( μ k P k 1 1 P k 1 i = 0 k 1 j = i + 1 k 1 μ j H i T Y i + H k T Y k ) = P k ( ( P k 1 H k T H k ) β k 1 + H k T Y k ) = β k 1 + P k H k T ( Y k H k β k 1 )
It is obvious that the regularization item λ 2 j = 1 k μ j | | β k | | 2 in the cost function of FR-OSELM will be forgotten and tends to zero with k increasing, thus, FR-OSELM will probably run into ill-conditioning problems and become unstable after a long period [32,33].

2.3. The Existing Algorithms for OSELM-PRFF

Recently, some papers [27,28,39] take the following cost function in online sequential RELM with forgetting factor:
L ( β k ) = λ 2 | | β k | | 2 + 1 2 i = 0 k ( j = i + 1 k μ j | | H i β k Y i | | 2 ) ,
Set L ( β k ) / β k = 0 , then βk can be obtained as follows:
β k = ( λ I + i = 0 k j = i + 1 k μ j H i T H i ) 1 ( i = 0 k j = i + 1 k μ j H i T Y i ) .
Here regularization item will not be forgotten over time. Moreover, two kinds of recursive calculation approaches for βk, i.e., FP-ELM and FGR-OSELM have been proposed.

2.3.1. FP-ELM

The main procedure of FP-ELM can be retold as follows.
For an initial data chunk S 0 = { ( x i , y i ) } i = 1 i = N 0 , let
K 0 = H 0 T H 0 ,
according to Equation (11), the initial network output weight is
β 0 = ( λ I + K 0 ) 1 H 0 T Y 0
When a new data chunk S k = { ( x i , y i ) } i = ( j = 0 k 1 N j ) + 1 i = j = 0 k N j , (k ≥ 1) arrives, the recursive way of updating the output weights can be written as
K k = μ k K k 1 + H k T H k ,
β k = β k 1 + ( λ I + K k ) 1 ( H k T ( Y k H k β k 1 ) λ ( 1 μ k ) β k 1 ) .
In Equation (15), the calculation of inverse of matrix, i.e., (λI+Kk)-1 make FP-ELM time-consuming.

2.3.2. FGR-OSELM

The procedure of FGR-OSELM can be summarized as follows.
Initialization phase:
P 0 = ( λ I + H 0 T H 0 ) 1 .
β 0 = P 0 H 0 T Y 0 .
Online sequential learning phase:
P k * = 1 μ P k 1 λ ( 1 μ ) μ 2 P k 1 ( I + λ ( 1 μ ) μ P k 1 ) 1 P k 1 1 μ P k 1 λ ( 1 μ ) μ 2 P k 1 ( I λ ( 1 μ ) μ P k 1 ) P k 1
P k = P k * P k * H k T H k P k * 1 + H k P k * H k T
β k = β k 1 + P k H k T ( Y k H k β k 1 ) λ ( 1 μ ) P k β k 1
In Equation (18), the approximate calculation of matrix P k * make FGR-OSELM unreliable in certain paradigms or parameters setups.
It should be noted that some online sequential RELM with forgetting factor, such as SF-ELM [34], RFOS-ELM [35], WOS-ELM [36], DU-OS-ELM [37], and FReOS-ELM [38], take Equation (10) as cost function, but take Equations (8) and (9) or their equivalent form as recursive solutions.

3. Proposed CF-OSELM-PRFF

FP-ELM is a stable online sequential training algorithm for SLFN which take the timeliness of samples into consideration and can circumvent the potential phenomenon of data saturation. However, the calculation of inverse of λI+Kk in Equation (15) is time-consuming in every work period. FGR-OSELM calculates approximate recursively by Equations (18) and (19), which can save time, whereas which might engender algorithm unstable. In order to speed up FP-ELM, the work proposes an approach to fast solve βk using Cholesky decomposition trick. The complete algorithm is termed as CF-OSELM-PRFF, which is described in the sequel.
Let
Q 0 = H 0 T Y 0 ,
Q k = μ k Q k 1 + H k T Y k .
Then, Equation (11) can be rewritten as
( λ I + K k ) β k = Q k .
Proposition 1.
The matrix λI+Kk is a symmetric positive definition matrix.
Proof Symmetry. Apparently, K0 is symmetric. Assume Kk-1 is symmetric, then
K k T = ( μ k K k 1 + H k T H k ) T = ( μ k K k 1 T + H k T H k ) = K k .
According to mathematical induction, for any k, Kk is symmetric. As a result, λI+Kk is symmetric.
Positive definitiveness. For any ζ=[ζ1, ζ2, …, ζL]T≠0, it holds that
ζ T K 0 ζ = ζ T H 0 T H 0 ζ = ( H 0 ζ ) T ( H 0 ζ ) = ( i = 1 L ζ i G ( a i , b i , x 1 ) ) 2 + ( i = 1 L ζ i G ( a i , b i , x 2 ) ) 2 + + ( i = 1 L ζ i G ( a i , b i , x N 0 ) ) 2 0.
Suppose Kk-1 is positive semi-definite, that is,
ζ T K k 1 ζ 0 ,
then,
ζ T μ k K k 1 ζ 0.
Similar to Equation (25), it holds that
ζ T H k T H k ζ 0 ,
Then,
ζ T K k ζ = ζ T ( μ k K k 1 + H k T H k ) ζ = ζ T μ k K k 1 ζ + ζ T H k T H k ζ 0
Additionally,
ζ T λ I ζ = λ i = 1 L ζ i 2 > 0 .
In conclusion, λI+Kk is a symmetric positive definition matrix.
Denote B=λI+Kk, then, B can be uniquely factorized into Cholesky form, i.e., B=UTU, where U is an upper triangular matrix. Then U can be calculated as following formulas [40]:
u i i = ( b i i d = 1 i 1 u d i 2 ) 1 / 2 = ( λ + K k ( i i ) d = 1 i 1 u d i 2 ) 1 / 2 , i = 1 , , L .
u i j = ( b i j d = 1 i 1 u d i u d j ) / u i i = ( K k ( i j ) d = 1 i 1 u d i u d j ) / u i i , j = i + 1 , , L .
Equation (23) can be solved by the following two equations:
U T P = Q k ,
U β k = P .
Denote Qk= [q1, q2, …, qL]T, utilize back substitution method, the solution to Equation (23), viz., the coefficient βi in Equation (1) may be gained as follows
p i = ( q i d = 1 i 1 u d i p d ) / u i i , i = 1 , , L
β k , i = ( p i d = i + 1 L u i d β k , d ) / u i i , i = L , , 1
The CF-OSELM-PRFF algorithm can be summarized as follows.
Step 1: Preparation.
(1) Choose the hidden output function G(a, b, x) of SFLN and the number of hidden nodes L; determine λ, μ.
(2) Randomly assign hidden parameters (ai, bi), i=1,2,…,L.
Step 2: Initialization.
(1) Acquire the initial data chunk S0 (N0 ≥ 2).
(2) Calculate H0, Y0.
(3) Calculate K0 by Equation (12), calculate Q0 by Equation (21).
(4) Calculate Cholesky factor U of K0 by Equations (31) and (32).
(5) Calculate β0 by Equations (35) and (36).
(6) For the input x, the predicting output value y ^ = h ( x ) β 0 .
Step 3: Online modeling and prediction, i.e., repeat the following substeps.
(1) Acquire the kth (k ≥ 1) data chunk Sk
(2) Calculate Hk, Yk.
(3) Calculate Kk by Equation (14), calculate Qk by Equation (22).
(4) Calculate U of Kk by Equations (31) and (32).
(5) Calculate βk by Equations (35) and (36).
(6) For the input x, the predicting output value y ^ = h ( x ) β k .

4. Experimental Results and Analysis

In this section, the performance of the presented CF-OSELM-PRFF is verified by a time-varying nonlinear process identification, two chaotic time series and one electricity demand prediction. These simulations are designed from the aspects of computation complexity (or running time) and accuracy of the CF-OSELM-PRFF by comparison with the FP-ELM, FGR-OSELM, AFGR-OSELM and FOS-MELM [21]. FOS-MELM is an online sequential multiple hidden layers extreme learning machine with forgetting mechanism, which is recently proposed by Xiao et al. To make the results of FOS-MELM more stable, a regularization term is introduced into its solving process according to [20].
For these online algorithms, the common regularization parameter λ is set as 0.001; the forgetting factor is set as μ=0.99. For AFGR-OSELM, adaptive forgetting factor is tuned in interval [0.8, 0.999] with initial value 0.995, and other peculiar parameters are set according to [28].
The output of hidden node with respect to the input x of a SLFN in Equation (1) is set as the sigmoid function, i.e., G(a, b, x)=1/(1+exp(-(a · x +b))), the components of a, i.e, the input weights and biase b are randomly chosen from the range [-1,1]. Specially, the hyperbolic tangent function G(a, b, x)=(1-exp(-(a · x+b)))/(1+exp(-(a · x +b))) is selected as activation function in FOS-MELM.
For FOS-MELM, every training data chunk contains only a sample, and each sample remains valid for s unit time; the parameter s is set as s = N0.
In order to observe performance of these approaches under various situations, the number L of hidden nodes of SLFN is set as 25, 50, 100, 150, 200, and the corresponding number N0 of initial training samples is assigned to 50, 100, 200, 200, 200, respectively.
The root mean square error (RMSE) of prediction is regarded as measurement index of model accuracy.
RMSE = 1 t 2 t 1 + 1 i = t 1 t 2 ( y ^ i y i ) 2
The relative efficiencies of CF-OSELM-PRFF to its counterparts are measured by speedup ratios. The speedups of CF-OSELM-PRFF to the other related methods are defined as
speedup 1 = total running time of FP-ELM total running time of CF-OSELM-PRFF
speedup 2 = total running time of FGR-OSELM total running time of CF-OSELM-PRFF
speedup 3 = total running time of AFGR-OSELM total running time of CF-OSELM-PRFF
speedup 4 = total running time of FOS-MELM total running time of CF-OSELM-PRFF
All the performance assessments were carried out in MATLAB R2010b 32-bit environment running on Windows 7 32-bit with Intel Core i3-3220 3.3 GHz CPU and 4 GB RAM.

4.1. Time-varying Nonlinear Process Identification

The identified unknown system is a modified version of the one addressed in [41]; by changing the constant and the coefficients of variables, the time-varying system is expressed as follows:
y ( k + 1 ) = { y ( k ) 1 + y ( k ) 2 + u ( k ) 3 k 100 2 y ( k ) 2 + y ( k ) 2 + u ( k ) 3 100 < k 300 y ( k ) 1 + 2 y ( k ) 2 + 2 u ( k ) 3 300 < k
The system (42) can be expressed as follows:
y ( k ) = f ( x ( k ) )
where f(x) is a nonlinear function, x(k) is the regression input data vector
x(k)=[y(k-1), y(k-2),..., y(k-ny); u(k-nd), u(k-nd-1),..., u(k-nu)],
with ny, nd and nu being model structure parameters. They are set as ny = 3, nd = 1 and nu = 2 here. When SLFN is applied to approximate Equation (40), (x(k), y(k)) is the learning sample (xk, yk) of SLFN.
Denote k0 = N0+max (ny, nu) − nd, k1 = k0 + 500. The system input is set as follows:
u ( k ) = { rand ( ) 0.5 k k 0 sin ( 2 π ( k k 0 ) / 120 ) k 0 < k k 1 sin ( 2 π ( k k 1 ) / 50 ) k 1 < k
where, rand ( ) generates random numbers which are uniformly distributed in the interval (0,1).
The simulations are carried out for different numbers of steps and hidden nodes. Efficiency comparison between CF-OSELM-PRFF and FP-ELM, FGR-OSELM, AFGR-OSELM, together with FOS-MELM are listed in Table 1. Due to randomness of parameters a, b, and u(k) during initial stage (kk0), along with intrinsic uncertainty of computing environment, the testing results of simulation must possess variation. Consequentially, for each case, every running time is a mean over 5 independent trials. In each set of experiments with the same number of simulation steps and nodes, the best result is marked in bold.
Table 1 shows that, with the same number of hidden nodes and by performing the same number of simulation prediction steps, CF-OSELM-PRFF costs least time among five approaches statistically, therefore, CF-OSELM-PRFF has obvious speed advantage over FP-ELM. Moreover, with the increase of the number of hidden nodes, the speedup tends to become larger. FOS-MELM trains three hidden layer feedforward neural networks and consists of complex calculation steps; thus, it costs most time.
Table 2 displays a prediction RMSE comparison of CF-OSELM-PRFF to its counterparts. Every RMSE is also an average over 5 independent trials of each algorithm performing the set steps. In each set of results, the best one is marked in bold. From Table 2, it can be seen that there is not any apparent difference in the predictive accuracy among FP-ELM, FOS-MELM and CF-OSELM-PRFF; in some setups, FGR-OSELM and AFGR-OSELM can provide satisfactory, or even the best forecasts, but they cannot work optimally in certain cases. When the number of simulation prediction steps is set to 2500 or 3000, FGR-OSELM sometimes produces bad or low-accuracy predictions in the later stages of the simulation process, and therefore, the RMSE of FGR-OSELM becomes very large. Additionally, when the number of hidden nodes is set at 100, 150 or 200 and the number of simulation prediction steps is set at 1000 or more, AFGR-OSELM usually runs unstably and produces a very large RMSE. The reason for this is that in the recursive formulas of FGR-OSELM and AFGR-OSELM, the approximate calculation of inverse of a related matrix will yield errors which may propagate and accumulate; as a result, the algorithm is apt to producing unreliable results.
To intuitively observe and compare the accuracy and stability of these online sequential RELMs, fix L = 25, N0 = 50, and simulation prediction steps = 3000, execute FGR-OSELM repeatedly until a certain unstable predictive scenario occurs, plot the corresponding prediction error (predicted value minus real value) curves in Figure 1a, save the current a, b values, and initial u(k) (kk0) signal, which are called as adverse a, b values and initial u(k) signal. Subsequently, execute FP-ELM, CF-OSELM-PRFF and FOS-MELM with the adverse a, b values and initial u(k) signal, respectively, and demonstrate the prediction error curves of the three approaches in Figure 1b, c and d. Clearly, Figure 1a shows that the prediction errors of FGR-OSELM become very large and completely meaningless when the prediction step exceeds a certain limit. There are a few large peaks at certain instances which reveal the instability of the FGR-OSELM arising from its recursive approximate calculation. In order to explicitly exhibit variation of prediction error of FGR-OSELM, only the partial results within the first 2453 steps are presented. Additionally, Figure 1b, c and d shown that the prediction effect of CF-OSELM-PRFF is similar to that of FP-ELM and FOS-MELM; in other words, they possess the same ideal predictive performance.
Set L = 100, N0 = 200, and simulation prediction steps = 3000, execute AFGR-OSELM repeatedly until a certain unstable predictive scenario occurs, plot the corresponding prediction error curves in Figure 2a, save the current a, b values, and initial u(k) (kk0) signal, i.e., adverse a, b values and initial u(k) signal. Then, execute FP-ELM, CF-OSELM-PRFF and FOS-MELM with the adverse a, b values, and initial u(k) signal, respectively, and demonstrate the prediction error curves of the three approaches in Figure 2b, c and d. Clearly, Figure 2a shows that, the prediction errors of AFGR-OSELM become larger at the 872th step, reaching −67.7839. Actually, at the 878th step, the prediction error of AFGR-OSELM suddenly reaches −25286.1257. Too large prediction errors have not been marked in Figure 2a; thus, only the partial results prior to the 878th step are presented. Additionally, Figure 2b and c manifest CF-OSELM-PRFF possesses the same excellent predictive performance as FP-ELM. Figure 2d seems to show that FOS-MELM is slightly better than CF-OSELM-PRFF, but there is only a small difference between their RMSEs.
The above experiments indicate that for the parameter settings with which FGR-OSELM or AFGR-OSELM produces larger prediction errors, CF-OSELM-PRFF, FP-ELM and FOS-MELM can run stably and provide satisfactory prediction.

4.2. Lorenz Time Series Prediction

The Lorenz time series is a three-dimensional dynamical system that exhibits chaotic flow, which is described by the following equations [42,43]:
d x ( t ) d t = σ [ y ( t ) x ( t ) ] d y ( t ) d t = r x ( t ) y ( t ) x ( t ) z ( t ) d z ( t ) d t = x ( t ) y ( t ) b z ( t )
where x(t), y(t), and z(t) are the values of time series at time t. A typical choice for the parameter values is σ = 10, r = 28, and b = 8/3.
In this example, observations of the continuous x(t), y(t), and z(t) at a sequence of sampling time points are concerned. Hence, the function ode45, a standard solver for ordinary differential equations in MATLAB, is applied to generate sampling values of x(t), y(t), and z(t). The routine implements the fourth-order and fifth-order Runge-Kutta method with adaptive integral step size for efficient computation. The initial state is set as x(0) = 2, y(0) = 3, and z(0) = 4.
Let Ts indicates sampling period. At each sampling time point kTs, ([x(kTs-Ts), x(kTs-2Ts),..., x(kTs-nyTs)], x(kTs)) is a training sample for ELM. In time series one step ahead prediction, the common way is to use xk+1 = [x(kTs), x(kTs-Ts), ..., x(kTs-(ny-1)Ts)] to calculate predicting value of x(kTs+Ts), namely, x ^ ( k T s + T s ) . In the simulation, the sampling period is set as Ts = 0.02, the embedding dimension is chosen as ny = 3, and the x-coordinate of the Lorenz time series is considered for prediction.
In order to verify the computational efficiency of CF-OSELM-PRFF compared to FP-ELM, FGR-OSELM, AFGR-OSELM and FOS-MELM, the running times of these algorithms and speedups of CF-OSELM-PRFF to the other algorithms are tabulated in Table 3. Every running time is a mean over 5 independent trials. In each set of experiments with the same number of simulation steps and nodes, the best result is formatted in bold. As seen in Table 3, CF-OSELM-PRFF clearly outperforms the other algorithms in terms of speed.
Table 4 shows the prediction RMSE of five methods. Every RMSE is also an average value over 5 independent trials. In each set of results, the best one is marked in bold. As shown in Table 4, on the whole, the prediction behaviors of these methods in this simulation are basically similar to those in the first simulation. When the number of simulation prediction steps is set to 3000, FGR-OSELM occasionally produces poor predictions in the later stage of the simulation process; thus, RMSE of FGR-OSELM becomes larger. Unexpectedly, in many cases, AFGR-OSELM cannot provide reasonable predictions.
Analogously, an intuitional comparison of predicting results for these algorithms is also made. Fix L = 200, N0 = 200, and simulation prediction steps = 3000, run FGR-OSELM repetitively until a certain unstable predictive scenario appears, illustrate the corresponding prediction error curves in Figure 3a, save the current a, b values, i.e., adverse a, b values. Subsequently, execute AFGR-OSELM, FP-ELM, CF-OSELM-PRFF and FOS-MELM with the adverse a, b values, respectively, their prediction error curves are demonstrated in Figure 3b, c, d and e.
Figure 3a shows that the prediction errors of FGR-OSELM become very large when the prediction step exceeds a certain limit. In Figure 3b, only curve from the 450th step to the 540th step is plotted, because after the 540th step, AFGR-OSELM provides too large prediction errors at many time points. The recursive approximate calculations of FGR-OSELM and AFGR-OSELM result in instability in certain settings. Figure 3c and d manifest prediction error curve of CF-OSELM-PRFF is extremely similar to that of FP-ELM. Comparing Figure 3d and e, it can be found that the prediction result of CF-OSELM-PRFF is slightly better than that of FOS-MELM.
Although normalizing time series values, i.e., input data of SLFN into the interval [0,1] or [−1,1], can significantly improve stability of FGR-OSELM and AFGR-OSELM, it is difficult to obtain the maximum and minimum values of the input data in some actual online modeling. Thus, the normalization of input data is infeasible sometimes. In this example, CF-OSELM-PRFF and FP-ELM can train SLFN and provide comfortable results using raw data; they are less susceptible to scope of input data than FGR-OSELM and AFGR-OSELM.

4.3. Rössler Time Series Prediction

The Rössler system is one of the most famous chaotic systems, though it is an artificial system designed solely with the purpose of creating a model for a strange attractor [44,45,46]. The Rössler time series is generated from the following differential equations:
d x ( t ) d t = y ( t ) z ( t ) d y ( t ) d t = x ( t ) + d y ( t ) d z ( t ) d t = e + z ( t ) ( x ( t ) f )
where x(t), y(t), and z(t) are the values of time series at time t. d, e, f are the control parameters and they are set as d = 0.15, e = 0.2 and f = 10.
The way to generate sampling values of x(t), y(t), and z(t) is the same as that aforementioned in the previous example. The initial condition is set as x(0) = 0.05, y(0) = 0.05, and z(0) = 0.05. The sampling interval is set as Ts=0.02, the embedding dimension is chosen as ny=3, and the x-coordinate of the the Rössler systems is considered for prediction.
In this simulation, the experimental design is the same as that in the previous simulations. The running times of these algorithms and speedups of CF-OSELM-PRFF to other algorithms are recorded in Table 5. Every running time is an average over 5 independent trials. In each set of experiments with the same number of simulation steps and nodes, the best result is formatted in bold. From Table 5, it is clear to see that CF-OSELM-PRFF is superior to other methods in terms of efficiency.
Table 6 shows the prediction RMSE of five methods. Every RMSE is an average over 5 independent trials. In each set of results, the best one is marked in bold. Different from the previous two experiments, here the chance that FGR-OSELM behaves unstably in the later segment of the simulation process when the numbers of simulation prediction steps are set as 3000. Moreover, AFGR-OSELM can behave stably with larger probability. If AFGR-OSELM is run for only 5 successive times, unreasonable prediction results rarely appear. Thus, its performance has not been investigated in the following contrastive demonstration. Additionally, in many cases, FOS-MELM achieves the best results; with the number of nodes increasing, FOS-MELM yields higher accuracy than CF-OSELM-PRFF at the expense of requiring more time.
Accordingly, for the case of L = 25, N0 = 50 and simulation prediction steps = 3000, FGR-OSELM, CF-OSELM-PRFF, FP-ELM and FOS-MELM are conducted with the same adverse a, b values, respectively, and their prediction error curves are plotted in Figure 4. Figure 4a shows that the FGR-OSELM works first but fails afterwards. Figure 4b and c manifest CF-OSELM-PRFF and FP-ELM provide almost the same good prediction results. Contrast Figure 4c and d, it can be found that error range of CF-OSELM-PRFF is smaller than that of FOS-MELM; in other words, the former yields better forecasting than the latter in the late stage.

4.4. Experiment on Real Data Set

Electricity load forecasting plays an important part in the strategy management of electricity power systems. Here, an electricity demand time series (EDTS) [47] is utilized to test the performances of these online algorithms; the EDTS consists of a sequence of 15 minutes averaged values of power demand. The first 3500 values of EDTS are shown in Figure 5.
Before training the model, the data are normalized into [−1,1] by Equation (48); after forecasting, the predicted values are denormalized by Equation (49).
y ( k ) 2 y ( k ) min ( y ) max ( y ) min ( y ) 1
y ^ ( k ) y ^ ( k ) + 1 2 ( max ( y ) min ( y ) ) + min ( y )
In this example, the experiment is designed like the previous ones. The running times of these algorithms and speedups of its counterparts are recorded in Table 7. In each set of experiments with the same number of simulation steps and nodes, the best result is formatted in bold. From Table 7, it is clear that CF-OSELM-PRFF runs faster than the other methods.
Table 8 shows prediction RMSE of five methods. In each set of results, the best one is marked in bold. CF-OSELM-PRFF produces nearly the same levels of accuracy as FP-ELM, but higher accuracy than FOS-MELM statistically. FGR-OSELM runs unstably only in one case, but AFGR-OSELM does so in many cases. In addition, different from the previous example, FOS-MELM did not achieve the best results here; CF-OSELM-PRFF yields higher accuracy than FOS-MELM in many cases.

4.5. Discussion

The above experiments show that CF-OSELM-PRFF has greater time efficiency than several other related approaches; the speedup ratio of CF-OSELM-PRFF to other approaches is mainly influenced by the number of hidden nodes of SLFN. The CF-OSELM-PRFF has achieved around 1.4 to 2.0 of speedup over the FP-ELM. This speedup can facilitate its use in real applications, and even in real-time applications.
CF-OSELM-PRFF can provide the same predictive accuracy as FP-ELM, and better stability than FGR-OSELM and AFGR-OSELM. Additionally, the experiments also show that there is not an obvious difference between the predictive accuracy of CF-OSELM-PRFF and that of FOS-MELM. In the third simulation, FOS-MELM outperformed CF-OSELM-PRFF statistically, whereas, CF-OSELM-PRFF surpassed FOS-MELM in the next one.
CF-OSELM-PRFF can learn arriving data one-by-one or chunk-by-chunk without the need for storing the training samples accumulated thus far; it is suitable for storage capacity-constrained computing devices.
In the above experiments, CF-OSELM-PRFF adopted a fixed forgetting factor to reduce the contribution of old samples; actually, it may absorb some variable forgetting factor skills, such as that reported in [39].

5. Conclusions

Regularization plays an important role in RELM, but in the cost function or recursive solution of many online sequential RELMs, the regularization effect will decay gradually over time. Fortunately, FP-ELM, FGR-OSELM and AFGR-OSELM can maintain persistent regularization effect throughout the whole learning process. They share the same cost function but employ different solving processes.
This paper makes full use of symmetry and positive definitiveness of the coefficient matrix of linear equations of OSELM-PRFF, and factorizes the matrix in Cholesky form to solve equations in every prediction step. Finally, a new solving method for OSELM-PRFF, i.e., CF-OSELM-PRFF is developed. The proposed method is a fast and reliable one; it is very appropriate for fast, and even for real-time modeling of time-varying nonlinear systems.
The regularization item in CF-OSELM-PRFF would not decay over time, but constant regularization parameters make CF-OSELM-PRFF deficient in terms of adaptability. Therefore, it would worthwhile to design a high-efficient method to adjust regularization parameter.

Author Contributions

X.Z. and X.K. conceived and developed the algorithm; X.K. designed the experiments; X.Z. performed the experiments and analyzed the results; X.Z. and X.K. Wrote the manuscript.

Funding

The work is supported by the Hunan Provincial Science and Technology Foundation of China (2011FJ6033); the National Natural Science Foundation of China (No. 61502540); National Science Foundation of Hunan Province (No. 2019JJ40406).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Park, J.; Sandberg, I.W. Universal approximation using radial basis function networks. Neural Comput. 1991, 3, 246–257. [Google Scholar] [CrossRef] [PubMed]
  2. Huang, G.B.; Chen, Y.; Babri, Q.H.A. Classification ability of single hidden layer feedforward neural networks. IEEE Trans. Neural Netw. 2000, 11, 799–801. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  3. Ferrari, S.; Stengel, R.F. Smooth function approximation using neural networks. IEEE Trans. Neural Netw. 2005, 16, 24–38. [Google Scholar] [CrossRef] [PubMed]
  4. Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: a new learning scheme of feedforward neural networks. In Proceedings of the international joint conference on neural networks, Budapest, Hungary, 25–29 July 2004. [Google Scholar]
  5. Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
  6. Wu, Y.; Liu, D.; Jiang, H. Length-changeable incremental extreme learning machine. J. Comput. Sci. Technol. 2017, 32, 630–643. [Google Scholar] [CrossRef]
  7. Zhu, C.; Zou, B.; Zhao, R.; Cui, J.; Duan, X.; Chen, Z.; Liang, Y. Retinal vessel segmentation in colour fundus images using Extreme Learning Machine. Comput. Med. Imag. Gr. 2017, 55, 68–77. [Google Scholar] [CrossRef] [PubMed]
  8. Liu, H.; Tian, H.Q.; Li, Y.F. Four wind speed multi-step forecasting models using extreme learning machines and signal decomposing algorithms. Energy Convers. Manag. 2015, 100, 16–22. [Google Scholar] [CrossRef]
  9. Mi, X.W.; Liu, H.; Li, Y.F. Wind speed forecasting method using wavelet, extreme learning machine and outlier correction algorithm. Energy Convers. Manag. 2017, 151, 709–722. [Google Scholar] [CrossRef]
  10. Sattar, A.M.A.; Ertuğrul, Ö. F.; Gharabaghi, B.; McBean, E.A.; Cao, J. Extreme learning machine model for water network management. Neural Comput. Appl. 2019, 31, 157–169. [Google Scholar] [CrossRef]
  11. Yang, Y.; Lin, X.; Miao, Z.; Yuan, X.; Wang, Y. Predictive Control Strategy Based on Extreme Learning Machine for Path-Tracking of Autonomous Mobile Robot. Intell. Auto. Soft Comput. 2015, 21, 1–19. [Google Scholar] [CrossRef]
  12. Salmeron, J.L.; Ruiz-Celma, A. Elliot and Symmetric Elliot Extreme Learning Machines for Gaussian Noisy Industrial Thermal Modelling. Energies 2019, 12, 90. [Google Scholar] [CrossRef]
  13. Rodriguez, N.; Alvarez, P.; Barba, L.; Cabrera-Guerrero, G. Combining Multi-Scale Wavelet Entropy and Kernelized Classification for Bearing Multi-Fault Diagnosis. Entropy 2019, 21, 152. [Google Scholar] [CrossRef]
  14. Demertzis, K.; Tziritas, N.; Kikiras, P.; Sanchez, S.L.; Iliadis, L. The Next Generation Cognitive Security Operations Center: Adaptive Analytic Lambda Architecture for Efficient Defense against Adversarial Attacks. Big Data Cogn. Comput. 2019, 3, 6. [Google Scholar] [CrossRef]
  15. Sonobe, R. Parcel-Based Crop Classification Using Multi-Temporal TerraSAR-X Dual Polarimetric Data. Remote Sens. 2019, 11, 1148. [Google Scholar] [CrossRef]
  16. Salerno, V.M.; Rabbeni, G. An Extreme Learning Machine Approach to Effective Energy Disaggregation. Electronics 2018, 7, 235. [Google Scholar] [CrossRef]
  17. Kasun, L.L.C.; Zhou, H.; Huang, G.B.; Vong, C.M. Representational learning with ELMs for big data. IEEE Intell. Syst. 2013, 286, 31–34. [Google Scholar]
  18. Ding, S.; Zhang, N.; Xu, X.; Guo, L.; Zhang, J. Deep Extreme Learning Machine and Its Application in EEG Classification. Math. Probl. Eng. 2015. [Google Scholar] [CrossRef]
  19. Yang, Y.; Wu, Q.M.J. Multilayer extreme learning machine with subnetwork nodes for representation learning. IEEE Trans. Cybern. 2016, 46, 2570–2583. [Google Scholar] [CrossRef] [PubMed]
  20. Xiao, D.; Li, B.; Mao, Y. A Multiple Hidden Layers Extreme Learning Machine Method and Its Application. Math. Probl. Eng. 2017. [Google Scholar] [CrossRef]
  21. Xiao, D.; Li, B.; Zhang, S. An online sequential multiple hidden layers extreme learning machine method with forgetting mechanism. Chemom. Intell. Lab. Syst. 2018, 176, 126–133. [Google Scholar] [CrossRef]
  22. Yang, Y.; Wu, Q.M.J.; Wang, Y. Autoencoder with invertible functions for dimension reduction and image reconstruction. IEEE Trans. Syst. Man Cybern. Syst. 2018, 48, 1065–1079. [Google Scholar] [CrossRef]
  23. Yang, J.; Sun, W.; Liu, N.; Chen, Y.; Wang, Y.; Han, S. A Novel Multimodal Biometrics Recognition Model Based on Stacked ELM and CCA Methods. Symmetry 2018, 10, 96. [Google Scholar] [CrossRef]
  24. Liang, N.Y.; Huang, G.B.; Saratchandran, P.; Sundararajan, N. A fast and accurate online sequential learning algorithm for feedforward networks. IEEE Trans. Neural Netw. 2006, 17, 1411–1423. [Google Scholar] [CrossRef] [PubMed]
  25. Frances-Villora, J.V.; Rosado-Muñoz, A.; Bataller-Mompean, M.; Barrios-Aviles, J.; Guerrero-Martinez, J.F. Moving Learning Machine towards Fast Real-Time Applications: A High-Speed FPGA-Based Implementation of the OS-ELM Training Algorithm. Electronics 2018, 7, 308. [Google Scholar] [CrossRef]
  26. Huynh, H.T.; Won, Y. Regularized online sequential learning algorithm for single-hidden layer feedforward neural networks. Patt. Recognit. Lett. 2011, 32, 1930–1935. [Google Scholar] [CrossRef]
  27. Guo, W.; Xu, T. Online sequential extreme learning machine with generalized regularization and forgetting mechanism. Control Decis. 2017, 32, 247–254. [Google Scholar]
  28. Guo, W.; Xu, T.; Tang, K.; Yu, J.; Chen, S. Online Sequential Extreme Learning Machine with Generalized Regularization and Adaptive Forgetting Factor for Time-Varying System Prediction. Math. Probl. Eng. 2018. [Google Scholar] [CrossRef]
  29. Deng, W.Y.; Zheng, Q.H.; Chen, L. Regularized extreme learning machine. In Proceedings of the IEEE Symposiumon Computational Intelligence and Data Mining, Nashville, TN, USA, 30 March–2 April 2009. [Google Scholar]
  30. Ding, S.; Ma, G.; Shi, Z. A Rough RBF Neural Network Based on Weighted Regularized Extreme Learning Machine. Neural Process. Lett. 2014, 40, 245–260. [Google Scholar] [CrossRef]
  31. Huang, G.B.; Zhou, H.; Ding, X.; Zhang, R. Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybern. Part B Cybern. 2012, 42, 513–529. [Google Scholar] [CrossRef]
  32. Er, M.J.; Shao, Z.; Wang, N. A study on the randomness reduction effect of extreme learning machine with ridge regression. In Proceedings of the Advances in Neural Networks—ISNN 2013, 10th International Symposium on Neural Networks, Dalian, China, 4–6 July 2013. [Google Scholar]
  33. Shao, Z.; Er, M.J.; Wang, N. An effective semi-cross-validation model selection method for extreme learning machine with ridge regression. Neurocomputing 2015, 151, 933–942. [Google Scholar] [CrossRef]
  34. Zhang, X.; Wang, H.L. Selective forgetting extreme learning machine and its application to time series prediction. Acta Phys. Sinica 2011. [Google Scholar] [CrossRef]
  35. Du, Z.; Li, X.; Zheng, Z.; Zhang, G.; Mao, Q. Extreme learning machine based on regularization and forgetting factor and its application in fault prediction. Chinese J. Instrum. 2015, 36, 1546–1553. [Google Scholar]
  36. Zhang, H.; Zhang, S.; Yin, Y. Online Sequential ELM Algorithm with Forgetting Factor for Real Applications. Neurocomputing 2017, 261, 144–152. [Google Scholar] [CrossRef]
  37. Li, Y.; Zhang, S.; Yin, Y.; Xiao, W.; Zhang, J. A Novel Online Sequential Extreme Learning Machine for Gas Utilization Ratio Prediction in Blast Furnaces. Sensors 2017, 17, 1847. [Google Scholar] [CrossRef] [PubMed]
  38. Wu, Z.; Tang, H.; He, S.; Gao, J.; Chen, X.; To, S.; Li, Y.; Yang, Z. Fast dynamic hysteresis modeling using a regularized online sequential extreme learning machine with forgetting property. Int. J. Adv. Manuf. Technol. 2018, 94, 3473–3484. [Google Scholar] [CrossRef]
  39. Liu, D.; Wu, Y.; Jiang, H. FP-ELM: An online sequential learning algorithm for dealing with concept drift. Neurocomputing 2016, 207, 322–334. [Google Scholar] [CrossRef]
  40. Martin, R.S.; Peters, G.; Wilkinson, J.H. Symmetric decomposition of a positive definite matrix. Num. Math. 1965, 7, 362–383. [Google Scholar] [CrossRef]
  41. Narendra, K.; Parthasarathy, K. Identification and control of dynamical systems using neural networks. IEEE Trans. Neural Netw. 1990, 1, 4–27. [Google Scholar] [CrossRef] [PubMed]
  42. Lorenz, E.N. Deterministic nonperiodic flows. J. Atmos. Sci. 1963, 20, 130–141. [Google Scholar] [CrossRef]
  43. Meng, Q.F.; Peng, Y.H.; Sun, J. The improved local linear prediction of chaotic time series. Chin. Phys. 2007, 16, 3220–3225. [Google Scholar]
  44. Rössler, O.E. An Equation for Continuous Chaos. Phys. Lett. A. 1976, 57, 397–398. [Google Scholar] [CrossRef]
  45. Peitgen, H.O.; Jürgens, H.; Saupe, D. Chaos and Fractals New Frontiers of Science, 2nd ed.; Springer: New York, NY, USA, 2004; pp. 636–646. [Google Scholar]
  46. Li, D.; Han, M.; Wang, J. Chaotic time series prediction based on a novel robust echo state network. IEEE Trans. Neural Netw. Learn. Syst. 2012, 23, 787–799. [Google Scholar] [CrossRef] [PubMed]
  47. Applications of Machine Learning Group. Available online: https://research.cs.aalto.fi/aml/datasets.shtml (accessed on 17 May 2019).
Figure 1. Prediction error curves of relevant approaches with an adverse parameters setting of FGR-OSELM on identification of process (42): (a) Prediction error curve of FGR-OSELM; (b) That of FP-ELM; (c) That of CF-OSELM-PRFF; (d) That of FOS-MELM.
Figure 1. Prediction error curves of relevant approaches with an adverse parameters setting of FGR-OSELM on identification of process (42): (a) Prediction error curve of FGR-OSELM; (b) That of FP-ELM; (c) That of CF-OSELM-PRFF; (d) That of FOS-MELM.
Symmetry 11 00801 g001
Figure 2. Prediction error curves of relevant approaches with an adverse parameters setting of AFGR-OSELM on identification of process (42): (a) Prediction error curve of AFGR-OSELM; (b) That of FP-ELM; (c) That of CF-OSELM-PRFF; (d) That of FOS-MELM.
Figure 2. Prediction error curves of relevant approaches with an adverse parameters setting of AFGR-OSELM on identification of process (42): (a) Prediction error curve of AFGR-OSELM; (b) That of FP-ELM; (c) That of CF-OSELM-PRFF; (d) That of FOS-MELM.
Symmetry 11 00801 g002
Figure 3. Prediction error curves of relevant approaches with an adverse parameters setting of FGR-OSELM on time series (46): (a) Prediction error curve of FGR-OSELM; (b) That of AFGR-OSELM; (c) That of FP-ELM; (d) That of CF-OSELM-PRFF; (e) That of FOS-MELM.
Figure 3. Prediction error curves of relevant approaches with an adverse parameters setting of FGR-OSELM on time series (46): (a) Prediction error curve of FGR-OSELM; (b) That of AFGR-OSELM; (c) That of FP-ELM; (d) That of CF-OSELM-PRFF; (e) That of FOS-MELM.
Symmetry 11 00801 g003aSymmetry 11 00801 g003b
Figure 4. Prediction error curves of relevant approaches with an adverse parameters setting of FGR-OSELM on time series (47): (a) Prediction error curve of FGR-OSELM; (b) That of FP-ELM; (c) That of CF-OSELM-PRFF; (d) That of FOS-MELM.
Figure 4. Prediction error curves of relevant approaches with an adverse parameters setting of FGR-OSELM on time series (47): (a) Prediction error curve of FGR-OSELM; (b) That of FP-ELM; (c) That of CF-OSELM-PRFF; (d) That of FOS-MELM.
Symmetry 11 00801 g004
Figure 5. EDTS.
Figure 5. EDTS.
Symmetry 11 00801 g005
Table 1. Efficiency comparison between CF-OSELM-PRFF and its counterparts on identification of process (42) with input (45).
Table 1. Efficiency comparison between CF-OSELM-PRFF and its counterparts on identification of process (42) with input (45).
#nodesAlgorithmsRunning Time (s) for Different Number of Simulation Prediction StepsSpeedup1Speedup2Speedup3Speedup4
25050010001500200025003000
25FP-ELM0.076 0.100 0.162 0.218 0.278 0.340 0.413
FGR-OSELM0.058 0.082 0.124 0.162 0.200 0.252 0.420
AFGR-OSELM0.086 0.140 0.240 0.367 0.493 0.608 0.733
FOS-MELM0.182 0.345 0.668 1.021 1.379 1.761 2.122
CF-OSELM-PRFF0.0600.0790.1080.1520.1940.2340.2721.444 1.182 2.4286.807
50FP-ELM0.098 0.146 0.256 0.358 0.456 0.564 0.662
FGR-OSELM0.084 0.130 0.214 0.296 0.390 0.478 0.578
AFGR-OSELM0.1300.2500.6280.9081.2681.5441.558
FOS-MELM0.581 1.132 2.150 3.238 4.234 5.264 6.314
CF-OSELM-PRFF0.0780.1140.1860.2560.3280.4260.4741.364 1.165 3.37612.306
100FP-ELM0.194 0.340 0.646 0.978 1.366 1.731 2.044
FGR-OSELM0.158 0.257 0.498 0.728 1.023 1.284 1.524
AFGR-OSELM0.4850.9471
FOS-MELM2.116 4.138 7.910 11.663 15.624 19.512 23.196
CF-OSELM-PRFF0.1340.2280.4400.6320.8781.0911.2331.575 1.180 3.95418.155
150FP-ELM0.357 0.683 1.410 1.990 2.666 3.256 3.848
FGR-OSELM0.300 0.580 1.120 1.705 2.307 2.876 3.324
AFGR-OSELM1.2682.785
FOS-MELM3.976 7.910 15.818 23.569 31.150 38.792 45.994
CF-OSELM-PRFF0.2170.3940.7521.0991.4591.8472.3021.761 1.513 6.63620.720
200FP-ELM0.682 1.269 2.452 3.590 4.602 5.824 7.177
FGR-OSELM0.662 1.212 2.342 3.599 4.708 5.748 6.854
AFGR-OSELM2.9705.818
FOS-MELM6.662 13.140 25.994 38.692 51.636 64.915 77.460
CF-OSELM-PRFF0.3600.7221.2691.9562.4723.1483.8761.854 1.820 8.11920.175
1 ‘—’ represents meaninglessness in the case.
Table 2. Prediction RMSE comparison of CF-OSELM-PRFF to its counterparts on identification of process (42) with input (45).
Table 2. Prediction RMSE comparison of CF-OSELM-PRFF to its counterparts on identification of process (42) with input (45).
#nodesAlgorithmsSimulation Prediction Steps
25050010001500200025003000
25FP-ELM0.0221 0.0203 0.0674 0.0536 0.04910.0471 0.0464
FGR-OSELM0.0222 0.0192 0.0665 0.0576 0.0507 332.4410 2134.2351
AFGR-OSELM0.0239 0.01820.0624 0.05350.0534 0.04370.0439
FOS-MELM0.0231 0.0242 0.06030.0571 0.0559 0.0481 0.0433
CF-OSELM-PRFF0.02120.0196 0.0634 0.0553 0.0517 0.0472 0.0468
50FP-ELM0.0219 0.0176 0.0641 0.0538 0.0487 0.0458 0.0402
FGR-OSELM0.02120.01710.0627 0.0527 0.0489 0.0433 2.2906
AFGR-OSELM0.0231 0.0182 0.0625 0.05070.04440.04160.0388
FOS-MELM0.0254 0.0211 0.0681 0.0605 0.0533 0.0457 0.0456
CF-OSELM-PRFF0.0222 0.0176 0.06040.0529 0.0492 0.0440 0.0441
100FP-ELM0.0323 0.0249 0.0607 0.0564 0.04590.0445 0.0396
FGR-OSELM0.0328 0.0245 0.06000.0544 0.0461 0.04112.5203
AFGR-OSELM0.03140.0252 ×1××××
FOS-MELM0.0336 0.02380.0742 0.0640 0.0495 0.0459 0.0424
CF-OSELM-PRFF0.0329 0.0244 0.0603 0.05410.0475 0.0434 0.0389
150FP-ELM0.0320 0.02380.06060.05110.0467 0.0430 0.0383
FGR-OSELM0.0322 0.0238 0.0613 0.0526 0.04560.041236.7141
AFGR-OSELM0.03080.0306 ×××××
FOS-MELM0.0344 0.0238 0.0721 0.0591 0.0534 0.0456 0.0408
CF-OSELM-PRFF0.0319 0.0242 0.0614 0.0519 0.0471 0.0430 0.0402
200FP-ELM0.0317 0.0237 0.0615 0.0526 0.04290.0407 0.0362
FGR-OSELM0.0314 0.0236 0.06020.0513 0.0434 0.04030.0427
AFGR-OSELM0.03070.0243 ×××××
FOS-MELM0.0320 0.02290.0670 0.0564 0.0506 0.0428 0.0417
CF-OSELM-PRFF0.0316 0.0238 0.0605 0.05070.0441 0.0404 0.0369
1 ‘×’ represents nullification owing to the too large RMSE.
Table 3. Efficiency comparison between CF-OSELM-PRFF and its counterparts on prediction of time series (46).
Table 3. Efficiency comparison between CF-OSELM-PRFF and its counterparts on prediction of time series (46).
#nodesAlgorithmsRunning Time (s) for Different Number of Simulation Prediction StepsSpeedup1Speedup2Speedup3Speedup4
25050010001500200025003000
25FP-ELM0.054 0.098 0.167 0.216 0.272 0.347 0.398
FGR-OSELM0.041 0.074 0.119 0.188 0.226 0.234 0.284
AFGR-OSELM0.0901
FOS-MELM0.194 0.345 0.703 1.047 1.386 1.710 2.053
CF-OSELM-PRFF0.0520.0680.1120.1460.1830.2400.2601.462 1.098 1.7317.007
50FP-ELM0.066 0.114 0.222 0.326 0.463 0.524 0.632
FGR-OSELM0.048 0.082 0.164 0.254 0.324 0.385 0.458
AFGR-OSELM0.1260.233
FOS-MELM0.508 0.987 2.026 3.108 4.152 5.190 6.076
CF-OSELM-PRFF0.0420.0820.1600.2390.2980.3800.4411.429 1.045 2.89413.429
100FP-ELM0.166 0.338 0.600 0.888 1.214 1.551 1.762
FGR-OSELM0.118 0.254 0.446 0.714 0.930 1.195 1.430
AFGR-OSELM0.450
FOS-MELM2.010 4.028 8.097 12.069 16.094 20.318 24.174
CF-OSELM-PRFF0.1080.2060.3800.5840.8040.9811.1501.547 1.208 4.16720.601
150FP-ELM0.354 0.667 1.346 2.123 2.877 3.563 4.154
FGR-OSELM0.284 0.557 1.139 1.811 2.436 3.022 3.610
AFGR-OSELM1.213
FOS-MELM4.026 7.992 16.137 24.157 32.581 40.494 48.799
CF-OSELM-PRFF0.1900.3740.7501.1971.6532.0382.3441.765 1.505 6.38520.384
200FP-ELM0.592 1.160 2.374 3.630 4.668 6.074 7.438
FGR-OSELM0.558 1.172 2.193 3.366 4.396 5.588 6.734
AFGR-OSELM2.502
FOS-MELM6.630 13.437 26.253 39.086 53.110 66.712 79.110
CF-OSELM-PRFF0.3140.6021.2631.8762.4683.3404.0361.866 1.727 7.96820.457
1 ‘—’ represents meaninglessness in the case.
Table 4. RMSE comparison of CF-OSELM-PRFF to its counterparts on prediction of time series (46).
Table 4. RMSE comparison of CF-OSELM-PRFF to its counterparts on prediction of time series (46).
#nodesAlgorithmsSimulation Prediction Steps
25050010001500200025003000
25FP-ELM0.10950.2265 0.1925 0.2194 0.2535 0.2674 0.1814
FGR-OSELM0.1882 0.2978 0.14690.3452 0.3131 0.2028 9896.2491
AFGR-OSELM0.1274×1×××××
FOS-MELM0.0796 0.2497 0.2118 0.1749 0.2932 0.19720.2887
CF-OSELM-PRFF0.1513 0.18580.1804 0.16730.15960.2128 0.1831
50FP-ELM0.07190.10250.1171 0.10950.1284 0.07010.1139
FGR-OSELM0.1231 0.1478 0.1063 0.1715 0.1597 0.1496 1651.0173
AFGR-OSELM0.10070.1403×××××
FOS-MELM0.0892 0.1688 0.3740 0.1587 0.1723 0.2471 0.1356
CF-OSELM-PRFF0.0797 0.1325 0.09480.1107 0.10100.1405 0.1291
100FP-ELM0.0839 0.07080.06100.06040.05890.05920.0745
FGR-OSELM0.1027 0.1033 0.0973 0.0628 0.0656 0.1016 0.5117
AFGR-OSELM0.0815××××××
FOS-MELM0.0968 0.1061 0.0935 0.0829 0.0922 0.0925 0.0665
CF-OSELM-PRFF0.07940.0816 0.0788 0.0779 0.0763 0.0805 0.0701
150FP-ELM0.0669 0.0599 0.04930.0559 0.04340.05220.0589
FGR-OSELM0.0769 0.0710 0.0567 0.05580.0578 0.0670 61.2365
AFGR-OSELM0.0831××××××
FOS-MELM0.0825 0.0687 0.0693 0.0612 0.0634 0.0765 0.0686
CF-OSELM-PRFF0.05870.05790.0561 0.0635 0.0610 0.0579 0.0619
200FP-ELM0.0627 0.0480 0.0451 0.04020.04620.0418 0.0419
FGR-OSELM0.05650.0504 0.0497 0.0464 0.0515 0.0412 23.1895
AFGR-OSELM0.0693××××××
FOS-MELM0.0815 0.0817 0.0505 0.0575 0.0596 0.0496 0.0693
CF-OSELM-PRFF0.0601 0.04600.04220.0440 0.0517 0.03880.0489
1 ‘×’ represents nullification owing to the too large RMSE.
Table 5. Efficiency comparison between CF-OSELM-PRFF and its counterparts on prediction of time series (47).
Table 5. Efficiency comparison between CF-OSELM-PRFF and its counterparts on prediction of time series (47).
#nodesAlgorithmsRunning Time (s) for Different Number of Simulation Prediction StepsSpeedup1Speedup2Speedup3Speedup4
25050010001500200025003000
25FP-ELM0.049 0.075 0.136 0.195 0.262 0.331 0.389
FGR-OSELM0.037 0.060 0.104 0.150 0.185 0.232 0.279
AFGR-OSELM0.0820.1240.2500.3630.4820.6040.730
FOS-MELM0.1880.3620.6861.031.4281.7562.106
CF-OSELM-PRFF0.0320.0520.0990.1290.1750.2190.2621.484 1.081 2.7327.806
50FP-ELM0.067 0.125 0.217 0.321 0.420 0.510 0.610
FGR-OSELM0.053 0.098 0.172 0.232 0.305 0.380 0.476
AFGR-OSELM0.1240.2300.4580.6920.9221.1921.364
FOS-MELM0.490 0.994 2.097 3.106 4.118 5.153 6.130
CF-OSELM-PRFF0.0450.0970.1690.2390.2920.3710.4661.352 1.023 2.96813.158
100FP-ELM0.160 0.312 0.572 0.880 1.202 1.503 1.827
FGR-OSELM0.129 0.234 0.456 0.676 0.879 1.145 1.397
AFGR-OSELM0.4600.9301.8842.8163.8104.7145.656
FOS-MELM2.012 4.166 8.194 12.232 16.646 20.617 24.890
CF-OSELM-PRFF0.1070.2020.3960.5780.7761.0081.1911.517 1.155 4.76120.849
150FP-ELM0.360 0.672 1.395 2.036 2.694 3.480 4.184
FGR-OSELM0.278 0.585 1.106 1.736 2.300 2.968 3.478
AFGR-OSELM1.1842.4244.9097.3829.64012.25214.838
FOS-MELM4.014 8.224 16.293 24.779 33.227 41.534 49.448
CF-OSELM-PRFF0.1980.3770.7531.1621.5221.8222.2941.824 1.532 6.47521.841
200FP-ELM0.596 1.159 2.475 3.950 5.089 6.187 7.385
FGR-OSELM0.577 1.163 2.271 3.922 4.644 5.793 7.003
AFGR-OSELM2.7305.14510.31815.08720.30525.38030.522
FOS-MELM6.672 13.200 26.207 39.492 52.814 65.882 78.909
CF-OSELM-PRFF0.3220.6101.3482.1182.5473.1413.8651.924 1.819 7.84820.298
Table 6. RMSE comparison of CF-OSELM-PRFF to its counterparts on prediction of time series (47).
Table 6. RMSE comparison of CF-OSELM-PRFF to its counterparts on prediction of time series (47).
#nodesAlgorithmsSimulation Prediction Steps
25050010001500200025003000
25FP-ELM0.00200 0.00254 0.00330 0.00486 0.00590 0.00730 0.01494
FGR-OSELM0.00230 0.00304 0.00326 0.00386 0.00462 0.00732 203.56020
AFGR-OSELM0.002460.002800.002880.003180.003040.005980.01324
FOS-MELM0.001360.001240.001920.00344 0.00566 0.01106 0.02382
CF-OSELM-PRFF0.00194 0.00246 0.00346 0.00464 0.00504 0.00710 0.01222
50FP-ELM0.00244 0.00238 0.00326 0.00384 0.00382 0.00528 0.01032
FGR-OSELM0.00284 0.00252 0.00284 0.00302 0.002880.004620.01264
AFGR-OSELM0.002520.002440.002760.002300.003170.004982.54816
FOS-MELM0.001840.001760.002040.00250 0.00368 0.00696 0.01590
CF-OSELM-PRFF0.00242 0.00240 0.00324 0.00366 0.00410 0.00494 0.01094
100FP-ELM0.00180 0.00230 0.00268 0.00272 0.00292 0.00420 0.00820
FGR-OSELM0.00138 0.00170 0.00190 0.00168 0.00220 0.00390 0.01366
AFGR-OSELM0.001340.001640.001800.001780.002200.003540.00910
FOS-MELM0.001000.001030.001100.001340.001900.00356 0.00784
CF-OSELM-PRFF0.00170 0.00222 0.00266 0.00266 0.00276 0.00428 0.00850
150FP-ELM0.00160 0.00202 0.00222 0.00216 0.00234 0.00342 0.00686
FGR-OSELM0.00110 0.00132 0.00130 0.00134 0.00176 0.00320 0.00714
AFGR-OSELM0.001040.001280.001460.001380.001800.003060.00638
FOS-MELM0.000690.000770.000810.001100.001540.002780.00636
CF-OSELM-PRFF0.00160 0.00202 0.00224 0.00220 0.00228 0.00334 0.00674
200FP-ELM0.00142 0.00178 0.00190 0.00182 0.00194 0.00300 0.00570
FGR-OSELM0.00096 0.00114 0.00112 0.00110 0.00152 0.00294 0.00628
AFGR-OSELM0.000860.001080.001380.001240.001540.002600.00614
FOS-MELM0.000600.000650.000670.000940.001320.002440.00516
CF-OSELM-PRFF0.00140 0.00176 0.00188 0.00182 0.00196 0.00288 0.00570
Table 7. Efficiency comparison between CF-OSELM-PRFF and its counterparts on prediction of EDTS.
Table 7. Efficiency comparison between CF-OSELM-PRFF and its counterparts on prediction of EDTS.
#nodesAlgorithmsRunning Time (s) for Different Number of Simulation Prediction StepsSpeedup1Speedup2Speedup3Speedup4
25050010001500200025003000
25FP-ELM0.058 0.102 0.178 0.254 0.336 0.418 0.506
FGR-OSELM0.039 0.053 0.103 0.144 0.185 0.232 0.288
AFGR-OSELM0.068 0.126 1
FOS-MELM0.170 0.336 0.678 1.020 1.338 1.832 2.128
CF-OSELM-PRFF0.0240.0460.0900.1300.1700.2180.2521.991 1.124 2.771 8.067
50FP-ELM0.089 0.162 0.306 0.460 0.602 0.760 0.916
FGR-OSELM0.046 0.086 0.166 0.256 0.334 0.402 0.500
AFGR-OSELM0.128
FOS-MELM0.483 1.032 2.020 3.130 4.224 5.226 6.206
CF-OSELM-PRFF0.0440.0840.1580.2340.3080.3760.4412.003 1.088 2.909 13.571
100FP-ELM0.226 0.413 0.814 1.216 1.656 2.158 2.556
FGR-OSELM0.120 0.230 0.456 0.662 0.928 1.200 1.432
AFGR-OSELM0.436
FOS-MELM1.994 4.026 8.010 11.600 15.725 21.788 23.466
CF-OSELM-PRFF0.1060.2340.4750.7060.9281.1721.3561.816 1.010 4.113 17.401
150FP-ELM0.584 1.156 2.092 3.043 3.898 4.964 5.966
FGR-OSELM0.356 0.574 1.142 1.690 2.272 2.882 3.500
AFGR-OSELM1.476
FOS-MELM3.956 7.668 15.660 23.923 31.005 40.389 46.937
CF-OSELM-PRFF0.2380.3940.7681.1661.5801.9762.3222.570 1.470 6.202 20.078
200FP-ELM0.876 1.766 3.608 5.327 6.996 8.762 10.334
FGR-OSELM0.582 1.196 2.380 3.568 5.184 6.956 8.340
AFGR-OSELM3.162
FOS-MELM6.192 13.152 26.146 40.028 51.933 64.998 76.605
CF-OSELM-PRFF0.3960.7641.4562.1162.9183.5844.2762.429 1.819 7.985 17.992
1 ‘—’ represents meaninglessness in the case.
Table 8. RMSE comparison of CF-OSELM-PRFF to its counterparts on prediction of EDTS.
Table 8. RMSE comparison of CF-OSELM-PRFF to its counterparts on prediction of EDTS.
#nodesAlgorithmsSimulation Prediction Steps
25050010001500200025003000
25FP-ELM32.28002 30.83888 37.61498 35.39764 35.9039434.82232 34.55490
FGR-OSELM32.31098 30.8257037.50214 35.3915635.93366 34.95880 3716.21866
AFGR-OSELM34.73510 32.33776 ×1××××
FOS-MELM38.49708 33.75844 44.35528 41.75432 42.49752 41.60332 40.84012
CF-OSELM-PRFF32.1055230.91740 37.4600435.57738 35.97014 34.7510434.66202
50FP-ELM35.79906 32.89564 37.40156 35.95780 35.8994434.6865634.32452
FGR-OSELM35.83376 32.97100 37.41768 35.65734 36.19928 34.77612 34.34424
AFGR-OSELM40.34600 ××××××
FOS-MELM36.28396 34.33358 37.61886 36.17690 36.30670 35.38694 34.46474
CF-OSELM-PRFF35.7769632.7970837.3814635.6405235.90528 34.94106 34.20720
100FP-ELM29.90644 36.69486 36.6340634.98748 36.24932 34.87696 33.51692
FGR-OSELM29.82256 36.4802236.66528 34.98040 36.17566 34.85032 33.46508
AFGR-OSELM33.53562 ××××××
FOS-MELM30.26694 37.67300 37.31872 35.12966 36.24698 35.23746 33.92734
CF-OSELM-PRFF29.8226436.76328 36.81942 34.9062036.1075634.8398233.60692
150FP-ELM29.80988 36.4481436.63526 34.89114 36.06950 34.71408 33.54184
FGR-OSELM29.8066836.47544 36.72366 34.83888 36.08316 34.70846 33.62596
AFGR-OSELM33.40442 ××××××
FOS-MELM30.24228 37.89150 37.38668 35.20784 36.33474 35.50346 33.81606
CF-OSELM-PRFF29.85126 36.54124 36.6179834.7245836.0677834.6728033.53874
200FP-ELM29.82088 36.5035236.67734 34.78988 36.02270 34.68296 33.36908
FGR-OSELM29.8021436.51458 36.6332234.70966 36.0053234.6250633.35924
AFGR-OSELM33.15900 ××××××
FOS-MELM30.31664 38.29058 37.60300 35.41638 36.64308 35.48306 34.02962
CF-OSELM-PRFF29.82440 36.63616 36.66926 34.7043636.01218 34.63022 33.45856
1 ‘×’ represents nullification owing to the too large RMSE.

Share and Cite

MDPI and ACS Style

Zhou, X.; Kui, X. Cholesky Factorization Based Online Sequential Extreme Learning Machines with Persistent Regularization and Forgetting Factor. Symmetry 2019, 11, 801. https://doi.org/10.3390/sym11060801

AMA Style

Zhou X, Kui X. Cholesky Factorization Based Online Sequential Extreme Learning Machines with Persistent Regularization and Forgetting Factor. Symmetry. 2019; 11(6):801. https://doi.org/10.3390/sym11060801

Chicago/Turabian Style

Zhou, Xinran, and Xiaoyan Kui. 2019. "Cholesky Factorization Based Online Sequential Extreme Learning Machines with Persistent Regularization and Forgetting Factor" Symmetry 11, no. 6: 801. https://doi.org/10.3390/sym11060801

APA Style

Zhou, X., & Kui, X. (2019). Cholesky Factorization Based Online Sequential Extreme Learning Machines with Persistent Regularization and Forgetting Factor. Symmetry, 11(6), 801. https://doi.org/10.3390/sym11060801

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop