Parameter Optimization in a Leaky Integrator Echo State Network with an Improved Gravitational Search Algorithm

Lun, Shuxian; Zhang, Zhenqian; Li, Ming; Lu, Xiaodong

doi:10.3390/math11061514

Open AccessArticle

Parameter Optimization in a Leaky Integrator Echo State Network with an Improved Gravitational Search Algorithm

by

Shuxian Lun

^1,*,†,

Zhenqian Zhang

^1,†,

Ming Li

¹ and

Xiaodong Lu

^2,*

¹

School of Control Science and Engineering, Bohai University, Jinzhou 121013, China

²

School of Information Engineering, Suqian University, Suqian 223800, China

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics 2023, 11(6), 1514; https://doi.org/10.3390/math11061514

Submission received: 15 February 2023 / Revised: 12 March 2023 / Accepted: 15 March 2023 / Published: 21 March 2023

(This article belongs to the Special Issue New Trends of Machine Learning Applications in Computer Graphics and Image Processing)

Download

Browse Figures

Versions Notes

Abstract

:

In the prediction of a nonlinear time series based on a leaky integrator echo state network (leaky-ESN), building a reservoir related to the specific problem is a key step. For problems such as poor performance of randomly generated reservoirs, it is tough to determine the parameter values of the reservoirs. The work in this paper uses the gravitational search algorithm (GSA) to optimize the global parameters of a leaky-ESN, such as the leaking rate, the spectral radius, and the input scaling factor. The basic GSA has some problems, such as slow convergence and poor balance between exploration and exploitation, and it cannot solve some complex optimization problems well. To solve these problems, an improved gravitational search algorithm (IGSA) is proposed in this paper. First, the best agent and elite agents were archived and utilized to accelerate the exploration phase and improve the convergence rate in the exploitation phase. Second, to improve the effect of the poor fitness agents on the optimization result, a differential mutation strategy was proposed, which generated new individuals to replace original agents with worse fitness, increasing the diversity of the population and improving the global optimization ability of the algorithm. Finally, two simulation experiments showed that the leaky-ESN optimized by the IGSA had better prediction accuracy.

Keywords:

leaky integrator echo state network (leaky-ESN); gravitational search algorithm (GSA); differential mutation; time series prediction

MSC:

93-10

1. Introduction

A recurrent neural networks (RNN) is a potent machine learning model with a short-term memory and internal parameters that share characteristics. An RNN has some advantages in the nonlinear prediction of time series, so it is often used in natural language processing and various kinds of time series prediction. However, RNNs have the characteristics of extensive computation and a slow convergence rate. In 2001, Professor Jaeger proposed a new type of recurrent neural network, an echo state network (ESN) [1], which could solve the problems associated with a recurrent neural network and predict the Mackey–Glass chaotic time series. The prediction accuracy of an echo state network is 2400 times better than that of an RNN [2]. ESNs have been used in many fields, such as time series prediction [3,4], system modeling [5,6], dynamic pattern recognition [7], and so on. Compared with a traditional recurrent neural network, an ESN has the following characteristics:

It replaces the hidden layers of a recurrent neural network with pools of randomly sparse connected neurons;
It can map the input vector to the state vector;
It uses the linear regression method or least-squares algorithm to obtain the output weights, simplifying the network training process and obtaining the global optimal value.

In addition to the output connection weight matrix, the global parameters of an ESN include the spectral radius of the reservoir connection weight matrix, the input and output feedback scaling, the leaking rate, and so on. These parameters are usually selected based on historical experience, which makes the global approximation capability of an ESN suboptimal.

Professor Jaeger proposed an improved model of ESN, which is called the leaky integrator echo state network (leaky-ESN) [8]. A leaky-ESN can optimize the spectral radius, the input scaling factor, and the leaking rate of the internal connection weight matrix using the stochastic gradient descent method (SGD), which improves its prediction accuracy to a certain extent. However, the stochastic gradient descent method is sensitive to the initial value. If the initial value is not suitable, the stochastic gradient descent method easily falls into the local minimum, so many scholars use different methods to optimize the global parameter of an ESN. Zhang et al. [9] used an improved multiverse optimizer (MVO) to optimize the hyperparameters of a deep ESN, and the model obtained an engine fault recognition rate of 93%. In [10], the genetic algorithm (GA) was applied to a double-reservoir echo state network and adapted to the global parameters of an ESN. Wang et al. [11] optimized the echo state network with the artificial fish swarm algorithm (AFSA), satisfying the control requirements of the polyvinylchloride (PVC) polymerizing process. Zhang et al. [12] used a hybrid algorithm that was composed of the simulated annealing algorithm (SA) and the gray wolf optimization algorithm (GWO) to optimize an ESN, which was used for coverage optimization of wireless sensor networks, reducing the redundant distribution of sensor nodes and ensuring the smooth operation of sensor networks. Li et al. [13] used the particle swarm optimization (PSO) algorithm to optimize the output weight matrix and the number of neurons in the reservoir, and they proposed two boundary mutation strategies that verified the effectiveness of the model in electrical load prediction. Salah et al. [14] used a PSO-ESN to predict the remaining service life of an engine. According to reference [15], output layer neurons only need a partial connection to internal neurons in a reservoir, so Wang et al. proposed a binary particle swarm optimization algorithm (BPSO) to calculate the output connection weight matrix. Han et al. [16] used the quantum-behaved fruit fly optimization algorithm (QFFOA) to determine the size of the reservoir, the spectral radius, and the preprocessing of the original data. Many intelligent optimization algorithms have been used to optimize the parameters of an ESN; however, due to the reservoir containing more nodes and the search space of parameters being broad, the optimization effect needs to be improved.

The gravitational search algorithm is a heuristic optimization algorithm that simulates the phenomenon of gravity. In the algorithm, search agents on behalf of the objects are optimized, and their masses measure their performance. All agents interact with each other by the gravitational force, and tiny agents always move towards agents with heavier masses. With the continuous movement of the agents, the heaviest agent moves to the optimal position, i.e., it is the optimal solution to the problem. The GSA has high performance and good global search ability in solving nonlinear problems compared with other traditional optimization algorithms. However, there are still some problems, such as slow convergence rate and poor balance between exploration and exploitation.

The rest of the paper is organized as follows. Section 2 presents a brief introduction to the leaky-ESN and the GSA. Section 3 analyzes the problems that exist in the exploration phase and exploitation phase of the GSA, and an improved GSA is proposed. The simulation results and analysis of the leaky-ESN optimized by the IGSA are presented in Section 4. Finally, the conclusion is presented in Section 5.

2. The Leaky-ESN and the GSA

2.1. The Leaky-ESN

The network structure of a standard ESN includes three layers: input layer, reservoir, and output layer, as shown in Figure 1. The input,

u = u (t)

, is connected to the reservoir through the input weight matrix,

W^{i n}

; the output,

y = y (t)

, is fed back to the reservoir with an output feedback weight matrix,

W^{f b}

; the state of the reservoir is

x = x (t)

; W denotes the weight matrix of the reservoir with state

x = x (t)

through the output weight matrix,

W^{o u t}

, which is connected to the output layer. Assuming that the number of inputs, reservoirs, and outputs is M, N, and L, respectively, the dimensions of

W^{i n}, W, W^{f b}

, and

W^{o u t}

are

N \times M

,

N \times N

,

N \times L

, and

L \times (M + N)

, respectively.

The state update equation of a standard ESN is as follows:

x (n + 1) = f (W^{i n} u (n + 1) + W x (n) + W^{f b} y (n))

(1)

y (n) = g (W^{o u t} [x (n); u (n)]),

(2)

where f is an internal activation function (the tanh function is generally chosen), g is an output activation function (usually the identity function), and

[x; y]

denotes the concatenation of the two vectors x and y. A leaky-ESN is an improved model of a standard ESN, and its reservoir is made up of leak-integrator neurons. A leaky-ESN has the same topology as the standard ESN model. The modified status update equation for a leaky-ESN is as follows:

x (n + 1) = (1 - a) x (n) + f (S^{i n} W^{i n} u (n + 1) + ρ W x (n) + S^{f b} W^{f b} y (n)),

(3)

where

a \in (0, 1]

denotes the leaking rate,

ρ \in (0, 1]

denotes the spectral radius of the weight matrix W of the reservoir, and

S^{i n}

and

S^{f b}

are the input scaling factor and the output feedback scaling factor, respectively. To ensure that a leaky-ESN has the echo state property, Equation (4) must be satisfied.

a - ρ \geq 0 .

(4)

In particular, only the output weight matrix,

W^{o u t}

, needs to be trained in a leaky-ESN and a standard ESN, while the rest of the connection weight matrix remains unchanged once it is determined. In a training ESN, the weight matrices,

W^{i n}

, W,

W^{f b}

, are determined randomly and no longer change, except to train

W^{o u t}

. In the training stage, the state,

x (n)

is collected in rows into X, and the output values,

y (n)

, corresponding to

x (n)

are stored row by row in a vector, Y, so that the following formula can calculate

W^{o u t}

:

W^{o u t} = {(X^{T} X + ϵ I)}^{- 1} X^{T} Y,

(5)

where

\cdot^{T}

denotes the transpose of a matrix or vector,

ϵ

is a regularization coefficient, and

\cdot^{- 1}

denotes the inversion of a square matrix.

A leaky-ESN can use the normalized root mean square error (

E_{N R M S E}

) to evaluate the training accuracy, and

E_{N R M S E}

is calculated as follows:

E_{N R M S E} (y, y_{t e a c h}) = \sqrt{\frac{{〈∣ ∣ y (n) - y_{t e a c h} (n) ∣ ∣〉}^{2}}{{〈∣ ∣ y_{t e a c h} (n) - 〈y_{t e a c h} (n)〉 ∣ ∣〉}^{2}}},

(6)

where

y (n)

is the training output and

y_{t e a c h}

is the teacher output,

‖ \cdot ‖

denotes the Euclidean distance, and

〈\cdot〉

denotes the mean function.

2.2. The Gravitational Search Algorithm

The gravitational search algorithm (GSA) is a heuristic search algorithm based on the law of universal gravitation and the interaction between objects [17]. Fitness determines the mass, and the heavier agent represents a better solution in the GSA. Under the action of universal gravitation, an object will always move towards the object with the heaviest mass.

In the GSA, consider a model with N agents where the position of the ith agent is defined as follows:

X_{i} = (x_{i}^{1}, x_{i}^{2} \dots x_{i}^{d} \dots x_{i}^{D}) f o r i = 1, 2 \dots D,

(7)

where

x_{i}^{d}

denotes the position of the ith agent in the dth dimension. The mass of each agent is calculated by Equation (8):

m_{i} (t) = \frac{f i t_{i} (t) - w o r s t (t)}{b e s t (t) - w o r s t (t)}

(8)

M_{i} (t) = \frac{m_{i} (t)}{\sum_{j = 1}^{N} m_{j} (t)},

(9)

where

M_{i} (t)

represents the mass of the agent i at iteration t,

f i t_{i} (t)

represents the fitness of the agent i at iteration t,

b e s t (t)

and

w o r s t (t)

denote the optimal fitness and the worst fitness in the tth iteration, respectively.

The force acting on agent i is the sum of the forces acting on it by other agents:

F_{i}^{d} (t) = \sum_{j \in K b e s t, j \neq i} r a n d_{j}^{d} G (t) \frac{M_{j} (t) M_{i} (t)}{R_{i j} (t) + ϵ} (x_{j}^{d} (t) - x_{i}^{d} (t))

(10)

where

ϵ

is a small constant and

R_{i j} (t)

represents the Euclidean distance between agents i and j. Only gravitational agents can attract others,

K b e s t

is the number of attractive agents. As time goes by,

K b e s t

decreases linearly from N (total number of agents) to 1. The value

r a n d_{j}^{d}

is a random number between 0 to 1, which gives a stochastic characteristic to the GSA. The gravitational constant, G, is a function of time, where the initial value of G,

G_{0}

, decreases with time:

G (t) = G_{0} e^{- β \frac{t}{t_{m a x}}},

(11)

where

β = 20

, t is the current number of iterations, and

t_{m a x}

is the total number of iterations.

The acceleration of the agent i at time iteration t is calculated as follows:

a c c_{i}^{d} (t) = \frac{F_{i}^{d} (t)}{M_{i} (t)} .

(12)

Furthermore, the next position and velocity can be calculated as follows:

v_{i}^{d} (t + 1) = r a n d_{i} \times v_{i}^{d} (t) + a c c_{i}^{d} (t)

(13)

x_{i}^{d} (t + 1) = x_{i}^{d} (t) + v_{i}^{d} (t + 1),

(14)

where d is the problem dimension and

r a n d_{i}

is a random number in the interval [0,1].

3. The Improved Gravitational Search Algorithm (IGSA)

All optimization algorithms based on population behavior have two important characteristics: the ability to explore the wide search space, namely exploration, and the ability to converge near the most promising candidate solution, namely exploitation. The exploration weakens and the exploitation strengthens gradually with each iteration. In the GSA, according to Equations (10) and (11), a high value of G can guarantee the exploration ability in the initial stage. As the program running G continues to decrease, the movement of the agents slows down, the exploration weakens, and the exploitation strengthens. Unfortunately,

K b e s t

also decreases linearly over time and there are fewer and fewer gravitational agents, which can lead to the exploration weakening but the exploitation does not necessarily strengthen.

Figure 2 shows a case where the fitness function is

y = x^{2}

, where

F_{m n}

indicates that agent m is attracted by agent n. As we can see, A5 is the agent with the best fitness value, so A5 will always attract other agents to move toward it. However, A5 is also attracted to these agents and gradually moves away from the optimum, towards the center of all agents. After a period of time, the number of attractive agents continues to decrease and the strength of gravity decreases, which leads to a weakening in the exploration ability and a slow convergence rate in the exploitation, and the system fails to converge near the optimum. To solve these problems, we put forward the following solutions:

3.1. The Adaptive Gbest and Elite-Agent-Guided GSA

This method mainly includes two scenarios: one is when the location of the agent with the best global fitness is saved and utilized to speed up the exploration phase; the second is when there are too few attractive agents, which results in slow convergence, and a fixed number of elite agents should be used to provide attraction and speed up the exploitation phase. Figure 3 shows the movement behavior of agents after joining the gbest and elite individuals. As shown in the figure, the gbest provides an additional attraction to the other agents, helping them move to the best global position, which can help a suboptimal agent to surpass the current optimal agent and become the new gbest in the next iteration. In general, 20% of agents with a better fitness value for each iteration are selected as elite agents, when the kbest value is less than the number of elite agents, elite agents will attract the others. As shown in Figure 3b, A4 and A5 are elite agents. When kbest < 2, elite agents attract other agents, ensuring that the lighter agents can quickly move to the heavier agents, improving the convergence rate, and, finally, the optimal solution can be found.

When kbest is less than the number of elite agents, Equation (10) is changed to Equation (15):

F_{i}^{d} (t) = \sum_{j \in e l i t e, j \neq i} r a n d_{j}^{d} G (t) \frac{M_{j} (t) M_{i} (t)}{R_{i j} (t) + ϵ} (x_{j}^{d} (t) - x_{i}^{d} (t)),

(15)

where

e l i t e

represents the elite agents.

The modified velocity formula and position formula using gbest are as follows:

V_{i} (t + 1) = r a n d_{i} \times V_{i} (t) + c_{1} \times a c c_{i} (t) + c_{2} \times (g b e s t - X_{i} (t)),

(16)

X_{i} (t + 1) = X_{i} (t) + V_{i} (t + 1),

(17)

where

V_{i} (t)

is the velocity of agent i at iteration t,

r a n d_{i}

is a random number within 0 to 1,

c_{1}

and

c_{2}

are acceleration coefficients,

a c c_{i} (t)

is the acceleration of agent i at iteration t, and

g b e s t

is the location of the global optimal solution.

3.2. The Differential Mutation Strategy

As can be seen from the above, the better agents may become the new global optimal solution through the guidance of the gbest, while the agents with poor fitness can only gradually approach the current optimal position through the guidance of the gbest and elite agents, and it is difficult to be the best agent after one iteration. To increase the probability of the inferior agents becoming the best agent in the next iteration, and improve the contribution to the optimization results, the work in this paper proposes a differential mutation strategy.

The agents are sorted according to fitness, where the half of agents that have better fitness are selected as the high-quality solutions, and the other agents are the inferior solutions. After the mutation operation of Equation (18), we obtain the new agents:

U_{i} = X_{r 1} + F \cdot (X_{r 2} - X_{r 3}),

(18)

where

U_{i}

is the candidate agent after mutation,

X_{r 1}

is an agent randomly selected from the inferior solutions,

X_{r 2}

and

X_{r 3}

are different agents randomly selected from the high-quality solutions, F controls the scale of the different agents, and, usually,

F = 0.5

.

By Equation (19),

U_{i} = (u_{i}^{1}, u_{i}^{2} \dots u_{i}^{d} \dots u_{i}^{D})

is changed to

N_{i} = (n_{i}^{1}, n_{i}^{2} \dots n_{i}^{d} \dots n_{i}^{D})

after crossover:

n_{i}^{d} = \{\begin{matrix} u_{i}^{d} & r a n d (0, 1) < C R o r d = d_{r a n d} \\ x_{i}^{d} & o t h e r w i s e, \end{matrix}

(19)

where

X_{i}

is agent i from the inferior solutions,

C R

is the crossover probability, and

d_{r a n d}

is a random integer within the range of the dimension, D, which ensures that at least one dimension value comes from the agent generated by the mutation. The

N_{i}

values are new agents that have been updated by the differential mutation strategy, and they form the new population together with the high-quality solutions.

In the process of generating new agents, the mutation and crossover make full use of the combination of better agents and inferior agents, increasing the probability of appearing as the better agents and the diversity of population, which both improve the global optimization ability of the algorithm. In the early stages of the algorithm, the agents are far apart, and the differential mutation strategy gives the algorithm a strong global search ability and the ability to escape the local optimum. In the later periods of the algorithm, the agents are close to each other, so the differential mutation strategy can improve the local search ability and speed up the convergence rate.

3.3. IGSA Evaluation

To evaluate the performance and the effectiveness of the IGSA, we applied it to 13 benchmark functions [18]. Table 1 shows the unimodal test functions, mainly showing the convergence rate rather than the final results. Table 2 shows the multimodal test functions, which have many local minima, so it is important for them to obtain a global optimal value. The minimum value of

F_{8}

was

- 418.9829 \times n

, the minimum values of the other functions were zero. The value n was the dimension of the functions in Table 1 and Table 2.

We applied the IGSA to these benchmark functions and compared the results with the GSA and PSO. In all algorithms, the population size was

N = 50

, the dimension was

n = 30

, and the total number of iterations was 1000. In PSO, positive constants

c 1

and

c 2

were 0.5 and the inertia factor was

w = 0.8

. In the GSA and the IGSA,

G_{0}

was set to 100 and

β = 20

. For each algorithm, every function ran 30 times and the results are shown in Table 3 and Table 4.

As shown in Table 3, the IGSA provided better results than the GSA and PSO for functions in Table 1. This was especially visible in

F_{1}

and

F_{2}

. Figure 4, Figure 5 and Figure 6 show the convergence rate of all algorithms. According to these figures, the IGSA was the fastest to find the global optimum. This was due to the extra attractive power of the gbest and elite agents.

For multimodal functions, the final result was the most important as it reflected an algorithm’s ability to escape from local optima. As shown in Table 4, due to the differential mutation strategy, the IGSA had more chance to escape from local optima, so it had better results. The convergence behavior of some functions are shown in Figure 7 and Figure 8.

3.4. The Process of Reservoir Parameter Selection Based on the IGSA

Due to the uncertainty of reservoir parameters in the leaky-ESN, the work in this paper used the improved gravitational search algorithm to optimize reservoir parameters and proposed a reservoir parameter selection model based on the IGSA, which gave the leaky-ESN better generalization ability and prediction performance.

The process of reservoir parameter selection was as follows: first, the initial population was generated using reservoir parameters and the fitness of all agents was calculated, the population was then updated with the gbest and elite agents and inferior agents using the differential mutation strategy were updated, eventually obtaining a new population. When the termination condition was met, the algorithm stopped and output the corresponding parameters. Otherwise, the optimization continued in the search space. The process of reservoir parameter selection is shown in Figure 9.

Firstly, the input time series was preprocessed, which included noise removal and phase space reconstruction. The phase space reconstruction extended the time series into high-dimensional space and fully revealed the hidden information. Secondly, reservoir parameters were initialized, which included the input connection weight matrix,

W^{i n}

, the reservoir connection weight matrix, W, and the leaky-ESN training model. Thirdly, the reservoir parameter selection model was used to obtain the optimal parameters, which included the leaking rate, a, the spectral radius,

ρ

, the input scaling factor,

S^{i n}

, and the leaky-ESN prediction model with the optimal parameters. Finally, the leaky-ESN prediction model was used to predict the time series. The overall framework for time series prediction in the leaky-ESN is shown in Figure 10.

4. Simulation and Analysis

In this paper, we documented how we tested the performance of the leaky-ESN that was optimized by the IGSA using a time series prediction problem. The leaky-ESN that was optimized by the IGSA was compared with the leaky-ESN that was optimized by a basic GSA, the stochastic gradient descent (SGD) method, and the particle swarm optimization (PSO) algorithm. All of these algorithms optimized the leaking rate, a, the spectral radius,

ρ

, and the input scaling factor,

S^{i n}

, of the reservoir parameters. In SGD, the learning rate,

L R

, was 0.0002. In PSO, positive constants

c 1

and

c 2

were 0.5, and the inertia factor

w = 0.8

. In the GSA and the IGSA,

G_{0}

was set to 100 and

β = 20

. For the following time series prediction problems, the lengths of the training data and test data were both 10,000.

4.1. The Sine Time Series

The time series were created by the following equation:

u (n) = sin (n) + sin (0.51 n) + sin (0.22 n) + sin (0.1002 n) + sin (0.05343 n) .

(20)

This is a sum of essentially incommensurate sines with periods ranging from approximately 6 to approximately 120 discrete time steps. The desired output was

d (n) = u (n - 5)

, indicating that this was a delay-5, short-term recall task. For SGD, the three groups of initial values,

X = [a, ρ, S^{i n}], X_{1} = [0.33, 0.2, 0.3], X_{2} = [0.8, 0.2, 0.3], a n d X_{3} = [0.8, 0.66, 0.3]

that are given in reference [8], were used. For the IGSA, the GSA, and PSO, 50 initial individuals were generated, respectively, under the condition of satisfying the echo state property. In each model, the input connection weight matrix,

W^{i n}

, and the reservoir connection weight matrix, W, remain unchanged after random generation, and the output feedback weight matrix

W^{f b} = 0

. Meanwhile, the three models had the same number of reservoir units.

Figure 11 shows the training NRMSE of four different leaky-ESNs optimized by the IGSA, PSO, the SGD, and the GSA. It can be seen that all four algorithms had a good optimization effect. The SGD had the fastest convergence rate, but its NRMSE was the largest. The IGSA had a faster convergence rate than PSO and the GSA, and its NRMSE was the smallest among the three algorithms. The predicted errors of the four algorithms for the sine time series are shown in Figure 12. The prediction accuracy of the leaky-ESN that was optimized by using the IGSA is shown in Figure 13. We found that the IGSA’s predicted error was the smallest and the IGSA had a good prediction accuracy.

4.2. The Mackey–Glass Chaotic Time Series

The Mackey–Glass time series is the standard data set in the research field of time series, which has obvious nonlinear and nonstationary characteristics. Due to its chaotic characteristics, it is difficult to predict accurately. The discrete-time form of the Mackey–Glass chaotic time series is as follows:

x (n + 1) = x (n) + ▵ T (\frac{α x (t - \frac{τ}{▵ T})}{1 + x {(t - \frac{τ}{▵ T})}^{10}} + γ x (n)) .

(21)

In which,

α = 0.2

,

τ = 17

,

γ = - 0.1

, and

▵ T = 1 / 10

. The value

τ

denotes a delay factor and when

τ > 16.8

the time series (21) has a chaotic property. The initial values of the SGD and the leaky-ESN parameters were the same as the sine time series. Simulation results showed that the leaky-ESN that was optimized by the IGSA still showed good performance. Figure 14 shows that the IGSA had a minor training NRMSE. From Figure 15 and Figure 16, we found that the IGSA-optimized model had a smaller predicted error and could approximate the real value well.

The above two simulation experiments showed that compared with the leaky-ESN that was optimized by the SGD, PSO, and the GSA, the leaky-ESN that was optimized by the IGSA had better prediction performance.

5. Conclusions

The work in this paper applied the IGSA to a leaky-ESN to optimize the leaking rate, the spectral radius, and the input scaling factor and improve the leaky-ESN’s prediction accuracy. Modifying the velocity formula, using the optimal global agent and elite agents to guide agents to move, gave the algorithm better global exploration and a faster convergence rate. Meanwhile, the work proposed a differential mutation strategy, which made full use of the combination of better agents and inferior agents, increasing the probability of the better agents appearing, and enhancing the ability to escape from the local optimum. The benchmark functions experiment showed that the IGSA could provide better final results and a faster convergence rate. Finally, two numerical experiments showed that the leaky-ESN that was optimized by the IGSA had better prediction accuracy.

Author Contributions

Methodology, Z.Z.; Software, Z.Z.; Validation, Z.Z.; Data curation, Z.Z.; Writing—original draft, Z.Z.; Writing—review & editing, S.L., M.L. and X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China (61773074) and Research Foundation of Education Bureau of Liaoning Province (LJKZZ20220118).

Conflicts of Interest

The authors declare no conflict of interest.

References

Jaeger, H. The “echo state” approach to analysing and training recurrent neural networks-with an erratum note. Bonn Ger. Ger. Natl. Res. Cent. Inf. Technol. Gmd Tech. Rep. 2001, 34, 13. [Google Scholar]
Jaeger, H.; Haas, H. Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication. Science 2004, 304, 78–80. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Zheng, K.; Qian, B. Long-short term echo state network for time series prediction. IEEE Access 2020, 8, 91961–91974. [Google Scholar] [CrossRef]
Kim, T.; King, B.R. Time series prediction using deep echo state networks. Neural Comput. Appl. 2020, 32, 17769–17787. [Google Scholar] [CrossRef]
Chen, Q.; Shi, H. Echo state network-based backstepping adaptive iterative learning control for strict-feedback systems: An error-tracking approach. IEEE Trans. Cybern. 2019, 50, 3009–3022. [Google Scholar] [CrossRef] [PubMed]
Liu, X.I.; Liu, Y.; Chen, Y. Trajectory design and power control for multi-UAV assisted wireless networks: A machine learning approach. IEEE Trans. Veh. Technol. 2019, 68, 7957–7969. [Google Scholar] [CrossRef] [Green Version]
Wootton, A.J.; Day, C.R.; Taylor, S.L. Optimizing echo state networks for static pattern recognition. Cogn. Comput. 2017, 9, 391–399. [Google Scholar] [CrossRef]
Jaeger, H.; Lukoševičius, M. Optimization and applications of echo state networks with leaky-integrator neurons. Neural Netw. 2007, 20, 335–352. [Google Scholar] [CrossRef] [PubMed]
Li, X.; Bi, F.; Zhang, L. An Engine Fault Detection Method Based on the Deep Echo State Network and Improved Multi-Verse Optimizer. Energies 2022, 15, 1205. [Google Scholar] [CrossRef]
Zhong, S.; Xie, X. Genetic algorithm optimized double-reservoir echo state network for multi-regime time series prediction. Neurocomputing 2017, 238, 191–204. [Google Scholar] [CrossRef]
Wang, J.S.; Han, S. Echo state networks based predictive model of vinyl chloride monomer convention velocity optimized by artificial fish swarm algorithm. Soft Comput. 2014, 18, 457–468. [Google Scholar] [CrossRef]
Zhang, Y.; Cao, L.; Yue, Y. A novel coverage optimization strategy based on grey wolf algorithm optimized by simulated annealing for wireless sensor networks. Comput. Intell. Neurosci. 2021, 2021, 6688408. [Google Scholar] [CrossRef]
Wei, L.; Haitian, L. Electrical load forecasting using echo state network and optimizing by PSO algorithm. In Proceedings of the 2017 10th International Conference on Intelligent Computation Technology and Automation (ICICTA), Changsha, China, 9–10 October 2017. [Google Scholar]
Salah, S.B.; Fliss, I. Echo state network and particle swarm optimization for prognostics of a complex system. In Proceedings of the 2017 IEEE/ACS 14th International Conference on Computer Systems and Applications (AICCSA), Hammamet, Tunisia, 30 October–3 November 2017. [Google Scholar]
Wang, H.; Yan, X. Optimizing the echo state network with a binary particle swarm optimization algorithm. Knowl.-Based Syst. 2015, 86, 182–193. [Google Scholar] [CrossRef]
Han, Y.; Jing, Y. Multi-step prediction for the network traffic based on echo state network optimized by quantum-behaved fruit fly optimization algorithm. In Proceedings of the 2017 29th Chinese Control and Decision Conference (CCDC), Chongqing, China, 28–30 May 2017. [Google Scholar]
Rashedi, E.; Nezamabadi-Pour, H. GSA: A gravitational search algorithm. Inf. Sci. 2009, 179, 2232–2248. [Google Scholar] [CrossRef]
Wang, J.S.; Han, S. Evolutionary programming made faster. IEEE Trans. Evol. Comput. 1999, 3, 82–102. [Google Scholar]

Figure 1. The structure of a standard ESN.

Figure 2. Movement behavior of agents in the GSA: (a) movement behavior of agents when

k b e s t = 5

; (b) movement behavior of agents when

k b e s t = 2

; (c) result of movement behavior.

Figure 2. Movement behavior of agents in the GSA: (a) movement behavior of agents when

k b e s t = 5

; (b) movement behavior of agents when

k b e s t = 2

; (c) result of movement behavior.

Figure 3. Movement behavior of agents after using the gbest and elite agents: (a) movement behavior of agents when

k b e s t = 5

; (b) movement behavior of agents when

k b e s t \leq 2

; (c) result of movement behavior.

Figure 3. Movement behavior of agents after using the gbest and elite agents: (a) movement behavior of agents when

k b e s t = 5

; (b) movement behavior of agents when

k b e s t \leq 2

; (c) result of movement behavior.

Figure 4. The convergence behavior of

F_{1}

.

Figure 4. The convergence behavior of

F_{1}

.

Figure 5. The convergence behavior of

F_{5}

.

Figure 5. The convergence behavior of

F_{5}

.

Figure 6. The convergence behavior of

F_{7}

.

Figure 6. The convergence behavior of

F_{7}

.

Figure 7. The convergence behavior of

F_{9}

.

Figure 7. The convergence behavior of

F_{9}

.

Figure 8. The convergence behavior of

F_{13}

.

Figure 8. The convergence behavior of

F_{13}

.

Figure 9. The process of reservoir parameter selection based on the IGSA.

Figure 10. The overall framework for time series prediction in the leaky-ESN.

Figure 11. Training NRMSE for the sine time series.

Figure 12. Predicted errors for the sine time series.

Figure 13. Predicted values for the sine time series.

Figure 14. Training NRMSE for the MG time series.

Figure 15. Predicted errors for the MG time series.

Figure 16. Predicted values for the MG time series.

Table 1. Unimodal test functions.

Test Function	Dimension
$F_{1} (X) = \sum_{i = 1}^{n} x_{i}^{2}$	${[- 100, 100]}^{n}$
$F_{2} (X) = \sum_{i = 1}^{n} \|x_{i}\| + \prod_{i = 1}^{n} \|x_{i}\|$	${[- 10, 10]}^{n}$
$F_{3} (X) = \sum_{i = 1}^{n} {(\sum_{j = 1}^{i} x_{j})}^{2}$	${[- 100, 100]}^{n}$
$F_{4} (X) = max \{\|x_{i}\|, 1 \leq i \leq n\}$	${[- 100, 100]}^{n}$
$F_{5} (X) = \sum_{i = 1}^{n - 1} [100 {(x_{i + 1} - x_{i}^{2})}^{2} + {(x_{i} - 1)}^{2}]$	${[- 30, 30]}^{n}$
$F_{6} (X) = \sum_{i = 1}^{n} {([x_{i} + 0.5])}^{2}$	${[- 100, 100]}^{n}$
$F_{7} (X) = \sum_{i = 1}^{n} i x_{i}^{4} + r a n d o m [0, 1)$	${[- 1.28, 1.28]}^{n}$

Table 2. Multimodal test functions.

Test Function	Dimension
$F_{8} (X) = \sum_{i = 1}^{n} - x_{i} sin (\sqrt{\|x_{i}\|})$	${[- 500, 500]}^{n}$
$F_{9} (X) = \sum_{i = 1}^{n} [x_{i}^{2} - 10 cos (2 π x_{i}) + 10]$	${[- 5.12, 5.12]}^{n}$
$F_{10} (X) = - 20 exp (- 0.2 \sqrt{\frac{1}{n} \sum_{i = 1}^{n} x_{i}^{2}}) - exp (\frac{1}{n} \sum_{i = 1}^{n} cos (2 π x_{i})) + 20 + e$	${[- 32, 32]}^{n}$
$F_{11} (X) = \frac{1}{4000} \sum_{i = 1}^{n} x_{i}^{2} - \prod_{i = 1}^{n} cos (\frac{x_{i}}{\sqrt{i}}) + 1$	${[- 600, 600]}^{n}$
$F_{12} (X) = \frac{π}{n} {10 s i n (π y_{1}) + \sum_{i = 1}^{n - 1} {(y_{i} - 1)}^{2} [1 + 10 s i n^{2} (π y_{i + 1})] + {(y_{n} - 1)}^{2}} + \sum_{i = 1}^{n} u (X_{i}, 10, 100, 4)$
$y_{i} = 1 + \frac{X_{i} + 1}{4}$
$u (x_{i}, a, k, m) = \{\begin{matrix} k {(x_{i} - a)}^{m} & x_{i} > a \\ 0 & - a < x_{i} < a \\ k {(- x_{i} - a)}^{m} & x_{i} < - a \end{matrix}$	${[- 50, 50]}^{n}$
$F_{13} (X) = 0.1 \{{sin}^{2} (3 π x_{1}) + \sum_{i = 1}^{n} {(x_{i} - 1)}^{2} [1 + {sin}^{2} (3 π x_{i} + 1)] + {(x_{n} - 1)}^{2} [1 + {sin}^{2} (2 π x_{n})]\} + \sum_{i = 1}^{n} u (x_{i}, 5, 100, 4)$	${[- 50, 50]}^{n}$

Table 3. Experimental results of unimodal functions.

		PSO	GSA	IGSA
$F_{1}$	Average best-so-far.	$2.04 \times 10^{- 17}$	$4.18 \times 10^{- 22}$	$9.21 \times 10^{- 61}$
	Average mean fitness.	$1.11 \times 10^{- 17}$	$5.31 \times 10^{- 21}$	$2.21 \times 10^{- 59}$
$F_{2}$	Average best-so-far.	$5.81 \times 10^{- 10}$	$2.75 \times 10^{- 8}$	$3.75 \times 10^{- 19}$
	Average mean fitness.	$3.44 \times 10^{- 10}$	$2.11 \times 10^{- 8}$	$3.56 \times 10^{- 19}$
$F_{3}$	Average best-so-far.	267.44	216.36	50.25
	Average mean fitness.	443.03	227.01	103.45
$F_{4}$	Average best-so-far.	$2.72 \times 10^{- 5}$	$3.42 \times 10^{- 9}$	$3.67 \times 10^{- 10}$
	Average mean fitness.	$2.21 \times 10^{- 5}$	$3.35 \times 10^{- 9}$	$1.95 \times 10^{- 10}$
$F_{5}$	Average best-so-far.	24.88	26.01	2.12
	Average mean fitness.	25.12	62.43	10.21
$F_{6}$	Average best-so-far.	0.75387	$8.83 \times 10^{- 30}$	0
	Average mean fitness.	1.0012	$1.88 \times 10^{- 30}$	0
$F_{7}$	Average best-so-far.	0.018	0.0078	0.00071
	Average mean fitness.	0.049	0.0204	0.0014

Table 4. Experimental results of multimodal functions.

		PSO	GSA	IGSA
$F_{8}$	Average best-so-far.	−3474.6477	−5731.6815	−6491.628
	Average mean fitness.	−3237.771	−3237.771	−5994.9608
$F_{9}$	Average best-so-far.	17.9093	11.9395	6.9496
	Average mean fitness.	24.874	15.9193	9.5961
$F_{10}$	Average best-so-far.	$4.66 \times 10^{- 9}$	$5.08 \times 10^{- 11}$	$4.44 \times 10^{- 15}$
	Average mean fitness.	$3.18 \times 10^{- 9}$	$1.46 \times 10^{- 11}$	$4.21 \times 10^{- 15}$
$F_{11}$	Average best-so-far.	3.92	0.15	0.0098
	Average mean fitness.	4.14	0.18	0.0265
$F_{12}$	Average best-so-far.	$1.84 \times 10^{- 19}$	$4.12 \times 10^{- 24}$	$8.11 \times 10^{- 32}$
	Average mean fitness.	$1.38 \times 10^{- 19}$	$1.41 \times 10^{- 23}$	$1.18 \times 10^{- 31}$
$F_{13}$	Average best-so-far.	0.0233	0.1957	$1.46 \times 10^{- 7}$
	Average mean fitness.	0.0963	0.2366	$3.43 \times 10^{- 6}$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lun, S.; Zhang, Z.; Li, M.; Lu, X. Parameter Optimization in a Leaky Integrator Echo State Network with an Improved Gravitational Search Algorithm. Mathematics 2023, 11, 1514. https://doi.org/10.3390/math11061514

AMA Style

Lun S, Zhang Z, Li M, Lu X. Parameter Optimization in a Leaky Integrator Echo State Network with an Improved Gravitational Search Algorithm. Mathematics. 2023; 11(6):1514. https://doi.org/10.3390/math11061514

Chicago/Turabian Style

Lun, Shuxian, Zhenqian Zhang, Ming Li, and Xiaodong Lu. 2023. "Parameter Optimization in a Leaky Integrator Echo State Network with an Improved Gravitational Search Algorithm" Mathematics 11, no. 6: 1514. https://doi.org/10.3390/math11061514

APA Style

Lun, S., Zhang, Z., Li, M., & Lu, X. (2023). Parameter Optimization in a Leaky Integrator Echo State Network with an Improved Gravitational Search Algorithm. Mathematics, 11(6), 1514. https://doi.org/10.3390/math11061514

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Parameter Optimization in a Leaky Integrator Echo State Network with an Improved Gravitational Search Algorithm

Abstract

1. Introduction

2. The Leaky-ESN and the GSA

2.1. The Leaky-ESN

2.2. The Gravitational Search Algorithm

3. The Improved Gravitational Search Algorithm (IGSA)

3.1. The Adaptive Gbest and Elite-Agent-Guided GSA

3.2. The Differential Mutation Strategy

3.3. IGSA Evaluation

3.4. The Process of Reservoir Parameter Selection Based on the IGSA

4. Simulation and Analysis

4.1. The Sine Time Series

4.2. The Mackey–Glass Chaotic Time Series

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI