1. Introduction
A recurrent neural networks (RNN) is a potent machine learning model with a short-term memory and internal parameters that share characteristics. An RNN has some advantages in the nonlinear prediction of time series, so it is often used in natural language processing and various kinds of time series prediction. However, RNNs have the characteristics of extensive computation and a slow convergence rate. In 2001, Professor Jaeger proposed a new type of recurrent neural network, an echo state network (ESN) [
1], which could solve the problems associated with a recurrent neural network and predict the Mackey–Glass chaotic time series. The prediction accuracy of an echo state network is 2400 times better than that of an RNN [
2]. ESNs have been used in many fields, such as time series prediction [
3,
4], system modeling [
5,
6], dynamic pattern recognition [
7], and so on. Compared with a traditional recurrent neural network, an ESN has the following characteristics:
It replaces the hidden layers of a recurrent neural network with pools of randomly sparse connected neurons;
It can map the input vector to the state vector;
It uses the linear regression method or least-squares algorithm to obtain the output weights, simplifying the network training process and obtaining the global optimal value.
In addition to the output connection weight matrix, the global parameters of an ESN include the spectral radius of the reservoir connection weight matrix, the input and output feedback scaling, the leaking rate, and so on. These parameters are usually selected based on historical experience, which makes the global approximation capability of an ESN suboptimal.
Professor Jaeger proposed an improved model of ESN, which is called the leaky integrator echo state network (leaky-ESN) [
8]. A leaky-ESN can optimize the spectral radius, the input scaling factor, and the leaking rate of the internal connection weight matrix using the stochastic gradient descent method (SGD), which improves its prediction accuracy to a certain extent. However, the stochastic gradient descent method is sensitive to the initial value. If the initial value is not suitable, the stochastic gradient descent method easily falls into the local minimum, so many scholars use different methods to optimize the global parameter of an ESN. Zhang et al. [
9] used an improved multiverse optimizer (MVO) to optimize the hyperparameters of a deep ESN, and the model obtained an engine fault recognition rate of 93%. In [
10], the genetic algorithm (GA) was applied to a double-reservoir echo state network and adapted to the global parameters of an ESN. Wang et al. [
11] optimized the echo state network with the artificial fish swarm algorithm (AFSA), satisfying the control requirements of the polyvinylchloride (PVC) polymerizing process. Zhang et al. [
12] used a hybrid algorithm that was composed of the simulated annealing algorithm (SA) and the gray wolf optimization algorithm (GWO) to optimize an ESN, which was used for coverage optimization of wireless sensor networks, reducing the redundant distribution of sensor nodes and ensuring the smooth operation of sensor networks. Li et al. [
13] used the particle swarm optimization (PSO) algorithm to optimize the output weight matrix and the number of neurons in the reservoir, and they proposed two boundary mutation strategies that verified the effectiveness of the model in electrical load prediction. Salah et al. [
14] used a PSO-ESN to predict the remaining service life of an engine. According to reference [
15], output layer neurons only need a partial connection to internal neurons in a reservoir, so Wang et al. proposed a binary particle swarm optimization algorithm (BPSO) to calculate the output connection weight matrix. Han et al. [
16] used the quantum-behaved fruit fly optimization algorithm (QFFOA) to determine the size of the reservoir, the spectral radius, and the preprocessing of the original data. Many intelligent optimization algorithms have been used to optimize the parameters of an ESN; however, due to the reservoir containing more nodes and the search space of parameters being broad, the optimization effect needs to be improved.
The gravitational search algorithm is a heuristic optimization algorithm that simulates the phenomenon of gravity. In the algorithm, search agents on behalf of the objects are optimized, and their masses measure their performance. All agents interact with each other by the gravitational force, and tiny agents always move towards agents with heavier masses. With the continuous movement of the agents, the heaviest agent moves to the optimal position, i.e., it is the optimal solution to the problem. The GSA has high performance and good global search ability in solving nonlinear problems compared with other traditional optimization algorithms. However, there are still some problems, such as slow convergence rate and poor balance between exploration and exploitation.
The rest of the paper is organized as follows.
Section 2 presents a brief introduction to the leaky-ESN and the GSA.
Section 3 analyzes the problems that exist in the exploration phase and exploitation phase of the GSA, and an improved GSA is proposed. The simulation results and analysis of the leaky-ESN optimized by the IGSA are presented in
Section 4. Finally, the conclusion is presented in
Section 5.
3. The Improved Gravitational Search Algorithm (IGSA)
All optimization algorithms based on population behavior have two important characteristics: the ability to explore the wide search space, namely exploration, and the ability to converge near the most promising candidate solution, namely exploitation. The exploration weakens and the exploitation strengthens gradually with each iteration. In the GSA, according to Equations (
10) and (
11), a high value of
G can guarantee the exploration ability in the initial stage. As the program running
G continues to decrease, the movement of the agents slows down, the exploration weakens, and the exploitation strengthens. Unfortunately,
also decreases linearly over time and there are fewer and fewer gravitational agents, which can lead to the exploration weakening but the exploitation does not necessarily strengthen.
Figure 2 shows a case where the fitness function is
, where
indicates that agent
m is attracted by agent
n. As we can see, A5 is the agent with the best fitness value, so A5 will always attract other agents to move toward it. However, A5 is also attracted to these agents and gradually moves away from the optimum, towards the center of all agents. After a period of time, the number of attractive agents continues to decrease and the strength of gravity decreases, which leads to a weakening in the exploration ability and a slow convergence rate in the exploitation, and the system fails to converge near the optimum. To solve these problems, we put forward the following solutions:
3.1. The Adaptive Gbest and Elite-Agent-Guided GSA
This method mainly includes two scenarios: one is when the location of the agent with the best global fitness is saved and utilized to speed up the exploration phase; the second is when there are too few attractive agents, which results in slow convergence, and a fixed number of elite agents should be used to provide attraction and speed up the exploitation phase.
Figure 3 shows the movement behavior of agents after joining the gbest and elite individuals. As shown in the figure, the gbest provides an additional attraction to the other agents, helping them move to the best global position, which can help a suboptimal agent to surpass the current optimal agent and become the new gbest in the next iteration. In general, 20% of agents with a better fitness value for each iteration are selected as elite agents, when the kbest value is less than the number of elite agents, elite agents will attract the others. As shown in
Figure 3b, A4 and A5 are elite agents. When kbest < 2, elite agents attract other agents, ensuring that the lighter agents can quickly move to the heavier agents, improving the convergence rate, and, finally, the optimal solution can be found.
When kbest is less than the number of elite agents, Equation (
10) is changed to Equation (
15):
where
represents the elite agents.
The modified velocity formula and position formula using gbest are as follows:
where
is the velocity of agent
i at iteration
t,
is a random number within 0 to 1,
and
are acceleration coefficients,
is the acceleration of agent
i at iteration
t, and
is the location of the global optimal solution.
3.2. The Differential Mutation Strategy
As can be seen from the above, the better agents may become the new global optimal solution through the guidance of the gbest, while the agents with poor fitness can only gradually approach the current optimal position through the guidance of the gbest and elite agents, and it is difficult to be the best agent after one iteration. To increase the probability of the inferior agents becoming the best agent in the next iteration, and improve the contribution to the optimization results, the work in this paper proposes a differential mutation strategy.
The agents are sorted according to fitness, where the half of agents that have better fitness are selected as the high-quality solutions, and the other agents are the inferior solutions. After the mutation operation of Equation (
18), we obtain the new agents:
where
is the candidate agent after mutation,
is an agent randomly selected from the inferior solutions,
and
are different agents randomly selected from the high-quality solutions,
F controls the scale of the different agents, and, usually,
.
By Equation (
19),
is changed to
after crossover:
where
is agent
i from the inferior solutions,
is the crossover probability, and
is a random integer within the range of the dimension, D, which ensures that at least one dimension value comes from the agent generated by the mutation. The
values are new agents that have been updated by the differential mutation strategy, and they form the new population together with the high-quality solutions.
In the process of generating new agents, the mutation and crossover make full use of the combination of better agents and inferior agents, increasing the probability of appearing as the better agents and the diversity of population, which both improve the global optimization ability of the algorithm. In the early stages of the algorithm, the agents are far apart, and the differential mutation strategy gives the algorithm a strong global search ability and the ability to escape the local optimum. In the later periods of the algorithm, the agents are close to each other, so the differential mutation strategy can improve the local search ability and speed up the convergence rate.
3.3. IGSA Evaluation
To evaluate the performance and the effectiveness of the IGSA, we applied it to 13 benchmark functions [
18].
Table 1 shows the unimodal test functions, mainly showing the convergence rate rather than the final results.
Table 2 shows the multimodal test functions, which have many local minima, so it is important for them to obtain a global optimal value. The minimum value of
was
, the minimum values of the other functions were zero. The value
n was the dimension of the functions in
Table 1 and
Table 2.
We applied the IGSA to these benchmark functions and compared the results with the GSA and PSO. In all algorithms, the population size was
, the dimension was
, and the total number of iterations was 1000. In PSO, positive constants
and
were 0.5 and the inertia factor was
. In the GSA and the IGSA,
was set to 100 and
. For each algorithm, every function ran 30 times and the results are shown in
Table 3 and
Table 4.
As shown in
Table 3, the IGSA provided better results than the GSA and PSO for functions in
Table 1. This was especially visible in
and
.
Figure 4,
Figure 5 and
Figure 6 show the convergence rate of all algorithms. According to these figures, the IGSA was the fastest to find the global optimum. This was due to the extra attractive power of the gbest and elite agents.
For multimodal functions, the final result was the most important as it reflected an algorithm’s ability to escape from local optima. As shown in
Table 4, due to the differential mutation strategy, the IGSA had more chance to escape from local optima, so it had better results. The convergence behavior of some functions are shown in
Figure 7 and
Figure 8.
3.4. The Process of Reservoir Parameter Selection Based on the IGSA
Due to the uncertainty of reservoir parameters in the leaky-ESN, the work in this paper used the improved gravitational search algorithm to optimize reservoir parameters and proposed a reservoir parameter selection model based on the IGSA, which gave the leaky-ESN better generalization ability and prediction performance.
The process of reservoir parameter selection was as follows: first, the initial population was generated using reservoir parameters and the fitness of all agents was calculated, the population was then updated with the gbest and elite agents and inferior agents using the differential mutation strategy were updated, eventually obtaining a new population. When the termination condition was met, the algorithm stopped and output the corresponding parameters. Otherwise, the optimization continued in the search space. The process of reservoir parameter selection is shown in
Figure 9.
Firstly, the input time series was preprocessed, which included noise removal and phase space reconstruction. The phase space reconstruction extended the time series into high-dimensional space and fully revealed the hidden information. Secondly, reservoir parameters were initialized, which included the input connection weight matrix,
, the reservoir connection weight matrix,
W, and the leaky-ESN training model. Thirdly, the reservoir parameter selection model was used to obtain the optimal parameters, which included the leaking rate,
a, the spectral radius,
, the input scaling factor,
, and the leaky-ESN prediction model with the optimal parameters. Finally, the leaky-ESN prediction model was used to predict the time series. The overall framework for time series prediction in the leaky-ESN is shown in
Figure 10.
4. Simulation and Analysis
In this paper, we documented how we tested the performance of the leaky-ESN that was optimized by the IGSA using a time series prediction problem. The leaky-ESN that was optimized by the IGSA was compared with the leaky-ESN that was optimized by a basic GSA, the stochastic gradient descent (SGD) method, and the particle swarm optimization (PSO) algorithm. All of these algorithms optimized the leaking rate, a, the spectral radius, , and the input scaling factor, , of the reservoir parameters. In SGD, the learning rate, , was 0.0002. In PSO, positive constants and were 0.5, and the inertia factor . In the GSA and the IGSA, was set to 100 and . For the following time series prediction problems, the lengths of the training data and test data were both 10,000.
4.1. The Sine Time Series
The time series were created by the following equation:
This is a sum of essentially incommensurate sines with periods ranging from approximately 6 to approximately 120 discrete time steps. The desired output was
, indicating that this was a delay-5, short-term recall task. For SGD, the three groups of initial values,
that are given in reference [
8], were used. For the IGSA, the GSA, and PSO, 50 initial individuals were generated, respectively, under the condition of satisfying the echo state property. In each model, the input connection weight matrix,
, and the reservoir connection weight matrix,
W, remain unchanged after random generation, and the output feedback weight matrix
. Meanwhile, the three models had the same number of reservoir units.
Figure 11 shows the training NRMSE of four different leaky-ESNs optimized by the IGSA, PSO, the SGD, and the GSA. It can be seen that all four algorithms had a good optimization effect. The SGD had the fastest convergence rate, but its NRMSE was the largest. The IGSA had a faster convergence rate than PSO and the GSA, and its NRMSE was the smallest among the three algorithms. The predicted errors of the four algorithms for the sine time series are shown in
Figure 12. The prediction accuracy of the leaky-ESN that was optimized by using the IGSA is shown in
Figure 13. We found that the IGSA’s predicted error was the smallest and the IGSA had a good prediction accuracy.
4.2. The Mackey–Glass Chaotic Time Series
The Mackey–Glass time series is the standard data set in the research field of time series, which has obvious nonlinear and nonstationary characteristics. Due to its chaotic characteristics, it is difficult to predict accurately. The discrete-time form of the Mackey–Glass chaotic time series is as follows:
In which,
,
,
, and
. The value
denotes a delay factor and when
the time series (
21) has a chaotic property. The initial values of the SGD and the leaky-ESN parameters were the same as the sine time series. Simulation results showed that the leaky-ESN that was optimized by the IGSA still showed good performance.
Figure 14 shows that the IGSA had a minor training NRMSE. From
Figure 15 and
Figure 16, we found that the IGSA-optimized model had a smaller predicted error and could approximate the real value well.
The above two simulation experiments showed that compared with the leaky-ESN that was optimized by the SGD, PSO, and the GSA, the leaky-ESN that was optimized by the IGSA had better prediction performance.