A Hybrid Sparrow Search Algorithm of the Hyperparameter Optimization in Deep Learning

Fan, Yanyan; Zhang, Yu; Guo, Baosu; Luo, Xiaoyuan; Peng, Qingjin; Jin, Zhenlin

doi:10.3390/math10163019

Open AccessArticle

A Hybrid Sparrow Search Algorithm of the Hyperparameter Optimization in Deep Learning

by

Yanyan Fan

¹,

Yu Zhang

¹,

Baosu Guo

¹,

Xiaoyuan Luo

²

,

Qingjin Peng

³

and

Zhenlin Jin

^1,*

¹

School of Mechanical Engineering, Yanshan University, Qinhuangdao 066004, China

²

School of Electrical Engineering, Yanshan University, Qinhuangdao 066004, China

³

Department of Mechanical Engineering, University of Manitoba, Winnipeg, MB R3T 5V6, Canada

^*

Author to whom correspondence should be addressed.

Mathematics 2022, 10(16), 3019; https://doi.org/10.3390/math10163019

Submission received: 9 July 2022 / Revised: 11 August 2022 / Accepted: 12 August 2022 / Published: 22 August 2022

(This article belongs to the Section Engineering Mathematics)

Download

Browse Figures

Versions Notes

Abstract

:

Deep learning has been widely used in different fields such as computer vision and speech processing. The performance of deep learning algorithms is greatly affected by their hyperparameters. For complex machine learning models such as deep neural networks, it is difficult to determine their hyperparameters. In addition, existing hyperparameter optimization algorithms easily converge to a local optimal solution. This paper proposes a method for hyperparameter optimization that combines the Sparrow Search Algorithm and Particle Swarm Optimization, called the Hybrid Sparrow Search Algorithm. This method takes advantages of avoiding the local optimal solution in the Sparrow Search Algorithm and the search efficiency of Particle Swarm Optimization to achieve global optimization. Experiments verified the proposed algorithm in simple and complex networks. The results show that the Hybrid Sparrow Search Algorithm has the strong global search capability to avoid local optimal solutions and satisfactory search efficiency in both low and high-dimensional spaces. The proposed method provides a new solution for hyperparameter optimization problems in deep learning models.

Keywords:

deep learning; hyperparameter optimization; Hybrid Sparrow Search; global optimization

MSC:

90C26

1. Introduction

The abilities of data collection, storage, and processing in scientific research and engineering applications have been greatly improved with the rapid development of technologies [1]. Deep learning has been introduced in different fields of science and engineering, especially in data processing and analysis [2]. Based on artificial neural networks, deep learning imitates the mechanism of the human brain to process data information such as images, sounds, and texts. Deep learning has shown its potential in having great learning ability, wide coverage, high adaptability, and excellent portability in data processing and analysis [3].

Deep learning uses a network structure with multiple layers. Each layer processes the received signal and passes it to the next layer. In a deep neural network, many layers can be used between the input and output layers. These layers can perform linear and non-linear data transformations [4].

Hyperparameters are parameters that need to be determined before forming a neural network, such as the batch size, learning rate, and dropout rate. The performance of a deep learning model will be greatly impacted by the hyperparameter configuration.

In general, building an effective deep learning model is a complex and time-consuming process that involves determining an appropriate structure of a deep learning model and its hyperparameters. The performance of a deep learning algorithm is greatly affected by its hyperparameters. Therefore, determining hyperparameters is an important task in applications of deep learning. The purpose of this paper is to propose a hyperparameter optimization algorithm with superior performance.

Different methods have been applied in setting hyperparameters for deep learning, such as the manual search, grid search, and random search. However, these methods have the problems of poor performance in high-dimensional models, inefficiency, and low accuracy. Therefore, an effective method is required for the optimization of hyperparameters for deep learning algorithms.

Hyperparameter optimization finds a set of hyperparameters to fit the needs of a deep learning model. The search is conducted using a mathematical model of optimization [5]. The hyperparameter optimization is normally considered a “black box” searching process to determine the hyperparameter configuration of a deep learning model [6].

This paper proposes a method for hyperparameter optimization, by combining the Sparrow Search Algorithm and Particle Swarm Optimization, called the Hybrid Sparrow Search Algorithm. This method uses the advantages of the avoidance of the local optimal solution in the Sparrow Search Algorithm, and the search efficiency of Particle Swarm Optimization, to achieve global optimization.

The contributions of this research are as follows: (1) development of a heuristic algorithm with a strong global search ability; (2) improvement of the hyperparameter optimization; (3) application of the proposed algorithm in neural networks.

The remainder of this paper is organized as follows. Section 2 introduces the existing research. Section 3 provides details of the proposed method. Experiments to verify the proposed method are discussed in Section 4. Section 5 presents a discussion and future work, and the conclusion of this paper is presented in Section 6.

2. Related Research

2.1. Hyperparameter Optimization

Different methods have been proposed for the hyperparameter optimization including manual search, grid search, random search, and Bayesian optimization.

Manual search [7] can determine hyperparameters for simple models. Before the emergence of the big data era, neural network models used to process data were generally not complicated. Hyperparameters of a model could be manually decided by field experts. However, with big data applications, there are increased hyperparameters in a neural network model, for which manual search is not able to meet the demand [8]. Automatic hyperparameter design has become a new research field.

Larochelle [9] proposed a grid search algorithm to balance the system performance and computation efficiency in determining hyperparameters. Grid search is an exhaustive method that takes a compromise between computational overhead and performance. For a large dataset, it has an “exponential explosion” problem [10] with reduced search efficiency. Bergstra [11] proposed a random search method that is simple and easy to use. However, this method is largely blind and has low adaptability. It is not a high-performance hyperparameter search method.

For sequential models [12,13], Bayesian optimization [14,15,16,17] is one of the most classic methods of hyperparameter optimizations. Compared with manual and random search methods, Bayesian optimization can fully use the information of previous searches in solving some complex problems [18]. However, this method can easily fall into a local optimal solution as it samples only around the optimal point.

Heuristic algorithms are based on experience that provides a feasible solution to the problem at an acceptable cost (computing time and space) [19]. They have been widely used in hyperparameter optimization. Sun [20] used a Simulated Annealing (SA) algorithm with a fast convergence rate for clustering problems of neural networks. Zhang [21] and Francescomarino [22] used Genetic Algorithms (GAs) to optimize network models based on ideas of heredity, crossover, and mutation in practical problems. As an improved evolutionary algorithm, Covariance Matrix Adaptation Evolutionary Strategies (CMA-ESs) [23] have been used to optimize neural networks. Lorenzo [24] and Djenouri [25] used Particle Swarm Optimization (PSO) with a fast convergence rate to optimize a deep learning network. However, these heuristic algorithms have a common problem, namely, that they easily fall into a local optimal solution when the objective function is non-convex. They are not effective in applications.

There are some recently proposed hyperparameter optimization methods. Ozturk [26] used a stochastic gradient descent to optimize the echo state network. Hu [27] used the Chimp Optimization Algorithm to increase the reliability and real-time capability of the network for classifying chest X-ray images. Kalita [28] developed Moth flame optimization and knowledge-based-search to optimize the hyperparameters of Support Vector Machine (SVM) [29]. Wu [30] used the sine–cosine algorithm to tune parameters of the Extreme Learning Machine (ELM) for diagnosing COVID-19 positive cases. Wang [31] used Whale Optimization for a real-time COVID-19 detector with parallel implementation capability. Several other algorithms have also been used to diagnose COVID-19 in recent years [32,33,34]. However, these methods only process some specific models.

In summary, the existing hyperparameter optimization algorithms generally have the problems of poor global search ability and propensity to fall into local optimal solutions [35,36].

2.2. Particle Swarm Optimization (PSO) and Sparrow Search Algorithm (SSA)

PSO [37] is an evolutionary algorithm inspired by the regularity of bird swarm activities. Based on the bird swarm activity behavior, PSO shares information about individuals in the entire swarm, for the evolution process from disorder to order in a problem space, to obtain the optimal solution.

PSO is an optimization algorithm based on iteration. The system is initialized as a set of random solutions to iteratively search for the optimal value. The algorithm searches through particles following the optimal particle in the solution space. Although this algorithm has a high search efficiency, it cannot avoid local optimal solutions when all individuals are concentrated near a local optimal solution.

The Sparrow Search Algorithm (SSA) [38] uses heuristic search to simulate the foraging process of sparrows as a kind of discoverer–follower model with a scouting and early warning mechanism. SSA has the ability to avoid the local optimal solution. However, it converges slowly and cannot obtain the optimal solution within an acceptable time [38]. In addition, our research found that this algorithm has data overflow problems caused by excessive exponents when the solution value is large. As a result, a large number of solutions are concentrated on the upper bound of the feasible region, which reduces the diversity of solutions and affects the algorithm performance. In addition, SSA has not been used to optimize hyperparameters of the neural network model.

In summary, the existing methods generally have poor global search ability. Although SSA has the ability to avoid local optimal solutions, it cannot be used in hyperparameter optimization problems due to the problem mentioned above. As a swarm intelligence algorithm, PSO is efficient for complex problems such as hyperparameter optimization. However, it also easily falls into a local optimal solution. This paper proposes a Hybrid Sparrow Search Algorithm (HSSA) having the advantages of both SSA and PSO for strong global search ability in complex problems to obtain the optimal solution within an acceptable time for hyperparameter optimization.

3. Proposed Approach

3.1. Hybrid Sparrow Search Algorithm (HSSA)

In order to address the problem of the existing hyperparameter optimization methods, which easily fall into local optimal solutions, we combine SSA with PSO, termed HSSA, by taking advantage of the velocity and displacement formula in PSO on the basis of SSA. The details are as follows.

(1): Algorithmic Modeling

Among sparrows, individuals with a high fitness value act as discoverers, and other individuals act as followers. At the same time, a certain proportion of individuals in the population is selected for detection and early warning. If any danger is found, they will search alternatives. The population with n sparrows is as follows:

X =  [\begin{matrix} x_{1}^{1} & x_{1}^{2} & \dots & x_{1}^{m} \\ x_{2}^{1} & x_{2}^{2} & \dots & x_{2}^{m} \\ \dots & \dots & \dots & \dots \\ x_{n}^{1} & x_{n}^{2} & \dots & x_{n}^{m} \end{matrix}]

(1)

where m represents the dimension of variables to be optimized, and n is the number of individuals. m depends on the dimension of the problem. n represents the size of the population. In general, a large n leads to high population diversity and high optimization accuracy, but the iteration speed is slow. The fitness value of all sparrows can be expressed as follows:

F =  [\begin{matrix} f (\begin{matrix} x_{1}^{1} & x_{1}^{2} & \dots & x_{1}^{m} \end{matrix}) \\ f (\begin{matrix} x_{2}^{1} & x_{2}^{2} & \dots & x_{2}^{m} \end{matrix}) \\ \dots \\ f (\begin{matrix} x_{n}^{1} & x_{n}^{2} & \dots & x_{n}^{m} \end{matrix}) \end{matrix}]

(2)

where f is the fitness value. F contains the fitness of all individuals in the entire population.

(2): Basic Rules

Discoverers usually have high energy reserves and are responsible for searching for food-rich areas. They provide foraging areas and directions for all followers. The level of energy reserve depends on the fitness value of every individual.

As soon as predators are detected, sparrows begin to chirp to send alarm signals. If the alarm value is greater than the safe value, discoverers will take followers to other safe areas.

Discoverers and followers change dynamically. As long as a better source of food can be found, a sparrow can become a discoverer, but the proportion of discoverers and followers in the entire population remains unchanged. In other words, whenever a sparrow becomes a discoverer, another sparrow becomes a follower.

Followers with less food have poor foraging positions in the entire population. Hungry followers are more likely to fly to other places to get food.

During the foraging process, followers can always search for the discoverer who provides the best food, or forage around the discoverer. At the same time, in order to increase their food reserves, some followers may constantly monitor discoverers for food resources.

Once aware of the danger, individuals at the edge of population will quickly move to a safe area for better positions. Individuals located in the middle of the population will randomly fly to other sparrows.

(3): Discoverers

Discoverers account for 10–20% of the entire population. Location updates of the discoverers are given as follows:

X_{i, j}^{s + 1} = \{\begin{array}{l} X_{i, j}^{s} \cdot \exp (- \frac{i}{α \cdot s_{\max}}), \begin{array}{l} R < T \end{array} \\ X_{i, j}^{s} + g \cdot L, \begin{array}{l} \begin{matrix} \begin{matrix}  \end{matrix} \end{matrix} & R \geq T \end{array} \end{array}

(3)

where i, j, and s are the ith sparrow, jth dimension, and sth iteration, respectively. X represents location information. smax is the maximum number of iterations. α (α ∈ (0, 1]) is a random number. R (R ∈ [0, 1]) and T (T ∈ [0.5, 1]) represent the warning value and safety value, respectively. g is a random number of a normal distribution. L represents a 1 × m matrix, and each element in the matrix is 1. Generally, the larger the value of s, the better the optimization effect; however, it takes more time. The likelihood of an individual being frightened depends on T.

(4): Followers

All sparrows except discoverers are followers. The original location update formula in SSA is as follows:

X_{i, j}^{s + 1} = \{\begin{array}{l} g \cdot \exp (\frac{X_{w}^{s} - X_{i, j}^{s}}{i^{2}}), \begin{matrix}  \end{matrix} \\ X_{P}^{s + 1} + |X_{i, j} - X_{P}^{s + 1}| \cdot A^{+} \cdot L, \end{array} \begin{matrix} i > n / 2 \\ i \leq n / 2 \end{matrix}

(4)

where X_w is the worst position. Our research found that Equation (4) has data overflow problems caused by excessive exponents in the case i > n/2 when the solution value is large. For example, if X_sw = 5000, X^s_i_,j = 1000, i = 4. X^s_i_,j = g × exp(250). Obviously, this causes a data overflow problem.

In order to solve this problem and improve the global search ability, we combine Equation (4) with the velocity and displacement formula in PSO. The new location updates of followers are as follows:

V_{i, j}^{s + 1} = ω \cdot V_{i, j}^{s} + c_{1} \cdot r_{1} \cdot (B_{i, j}^{s} - X_{i, j}^{s}) + c_{2} \cdot r_{2} \cdot (B_{g, j}^{s} - X_{i, j}^{s})

(5)

X_{i, j}^{s + 1} = \{\begin{matrix} X_{i, j}^{s} + V_{i, j}^{s + 1}, \begin{matrix} \begin{matrix}  \end{matrix} \end{matrix} \\ X_{P}^{s + 1} + |X_{i, j} - X_{P}^{s + 1}| \cdot A^{+} \cdot L, \end{matrix} \begin{matrix} i > n / 2 \\ i \leq n / 2 \end{matrix}

(6)

where V represents speed. ω (ω ∈ [0, 1]) is inertia weight. c₁ and c₂ are learning factors that generally take values of 0–4. r₁ and r₂ are random numbers between 0 and 1. B_i is the historical optimal solution of the ith sparrow. B_g is the global optimal solution of the entire population. X_p is the best position occupied by the discoverers. A represents a 1 × m matrix, and each element in the matrix is 1 or −1, and A⁺ = A^T(AA^T)⁻¹. ω is non-negative. In general, when it is large, the global search ability is strong; when it is small, the local search ability is strong. c₁ and c₂ are the individual learning factor and social learning factor of each individual, respectively. Nandar’s [39] experiment showed that satisfactory solutions can be obtained when c₁ and c₂ are constants; usually, c₁ = c₂ = 2.

Equation (5) is the speed update formula. Equation (6) is formed by adding the speed update formula to the position update formula of followers. This improvement not only solves the data overflow problem, but also improves the search speed. It enables HSSA to search for the hyperparameter optimization in an acceptable time.

(5): Vigilantes

Location updates of vigilantes are as follows:

X_{i, j}^{s + 1} = \{\begin{array}{l} B_{g, j}^{s} + β \cdot |X_{i, j}^{s} - B_{g, j}^{s}|, \begin{matrix} f_{i} > f_{g} \end{matrix} \\ X_{i, j}^{s} + k \cdot (\frac{|X_{i, j}^{s} - X_{w}^{s}|}{(f_{i} - f_{w}) + ε}), \begin{matrix} f_{i} \leq f_{g} \end{matrix} \end{array}

(7)

where X_w is the worst position. β is a parameter for controlling the step length. It is a random number that obeys the standard normal distribution. k is a random number between −1 and 1. f_i is the fitness value of the current sparrow. f_g and f_w are global best and global worst fitness values, respectively. ε is a constant to avoid zero in the denominator.

(6): Algorithm Framework

HSSA is composed of SSA and PSO. As shown in Figure 1, the left side of the flowchart is the SSA part, and the right side is the PSO part. The SSA part includes the position calculation of discoverers, followers, and vigilantes. The PSO part calculates the position of followers when i > n/2. The steps of HSSA are as follows.

Firstly, the initial population is randomly generated. Secondly, a fitness function is determined to evaluate the fitness of each individual and update the best solution. Thirdly, the population is divided into discoverers and followers based on fitness to update positions of discoverers and followers. If i > n/2, the PSO part is used to calculate the position of followers. Vigilantes are then randomly generated to update their position. Finally, the global best solution is calculated to determine whether the ending condition is met. The above steps are repeated until an individual that meets the ending condition is found. An individual who meets the ending condition is considered the best solution of HSSA. The pseudo-code is shown in Algorithm 1.

Algorithm 1. Procedure Hybrid Sparrow Search Algorithm
1	Input: individuals n, dimension m, iterations s_max
2	Output: optimal value
3	Initialize the population n, individual optimal value f_i and global optimal value f_g
4	for s in s_max do
5	Divide the population n into discoverers n_d and followers n_f
6	for i in n_d do
7	for j in m do
8	if R < T then
9	$X_{i, j}^{s + 1} = X_{i, j}^{s} \cdot \exp (- \frac{i}{α \cdot s_{\max}})$
10	else
11	$X_{i, j}^{s + 1} = X_{i, j}^{s} + g \cdot L$
12	for i in n_f do
13	for j in m do
14	if i > n/2 then
15	$V_{i, j}^{s + 1} = ω \cdot V_{i, j}^{s} + c_{1} \cdot r_{1} \cdot (B_{i, j}^{s} - X_{i, j}^{s}) + c_{2} \cdot r_{2} \cdot (B_{g, j}^{s} - X_{i, j}^{s})$
16	$X_{i, j}^{s + 1} = X_{i, j}^{s} + V_{i, j}^{s + 1}$
17	else
18	$X_{i, j}^{s + 1} = X_{P}^{s + 1} + \|X_{i, j} - X_{P}^{s + 1}\| \cdot A^{+} \cdot L$
19	Randomly generate vigilantes n_v
20	for i in n_v do
21	for j in m do
22	if f_i > f_g then
23	$X_{i, j}^{s + 1} = B_{g, j}^{s} + β \cdot \|X_{i, j}^{s} - B_{g, j}^{s}\|$
24	else
25	$X_{i, j}^{s + 1} = X_{i, j}^{s} + k \cdot (\frac{\|X_{i, j}^{s} - X_{w}^{s}\|}{(f_{i} - f_{w}) + ε})$
26	for i in n do
27	Update f_i
28	Update f_g
29	if f_g meet the requirement then
30	exit for
31	Return the optimal value

3.2. Fitness Function

In order to evaluate the algorithm objectively, we chose the average accuracy of a validation set as the training accuracy of the learning algorithm [40]. In order to facilitate the process and comparison of results, the error is defined as the fitness function value [41]. Equation (8) is the fitness function used for the evaluation of hyperparameters. A smaller fitness function value indicates better hyperparameters.

F i t n e s s = 1 - \frac{\sum_{i = 1}^{n} a c c u r a c y_{i}}{n}

(8)

4. Experiments

Experiments were conducted to verify the performance of HSSA by comparing it with several algorithms that are currently recognized as being excellent. A computer with an Intel i7 CPU, NVIDIA RTX3070 GPU, and 32 GB memory was used. The neural network model was built using TensorFlow. Random Search, Bayesian Optimization, CMA-ES, SA, GA, PSO, SSA, and HSSA were all written in Python 3. The details are as follows.

4.1. Convolutional Neural Network

The convolutional neural network (CNN) is a well-known deep learning architecture inspired by the biological vision mechanism [42]. Convolutional neural networks rely on convolution and pooling to identify information [43] and have been widely used in many fields such as target detection and image classification [44]. LetNet-5 [45] is a classic convolutional neural network and is often used to test algorithms. We used LetNet-5 in subsequent experiments to verify the performance of our method in low-dimensional space. A LetNet-5 model is shown in Figure 2.

In Figure 2, C1 and C2 are convolutional layers, P1 and P2 are pooling layers, and F1 and F2 are fully connected layers. In order to verify HSSA in complex networks, we propose a more complex convolutional neural network based on AlexNet [46], as shown in Figure 3.

In Figure 3, C1–C5 are convolutional layers, P1–P3 are pooling layers, and F1–F4 are fully connected layers. The role of each layer in the neural network is as follows.

The input layer determines the type and style of the data entered.
The convolutional layer is performed on two matrices. The convolution kernel moves on the input matrix with a certain step length. The output matrix is obtained after the convolution operation.
The pooling layer is used for down sampling. The pooling layer continuously reduces the size of the data space. The number of parameters and calculations are decreased to control data over-fitting.
Fully connected layers connect to all nodes of the previous layer, thus integrating extracted features in mapping distributed features to the sample label space.
The output layer outputs the final result.

4.2. Performance Verification

The performance of HSSA was verified using two experiments with MNIST and Five Flowers datasets. The MNIST dataset is relatively simple and has a high classification accuracy. The Five Flowers dataset is relatively complex, and it is difficult to achieve a high classification accuracy using it. For the search effect in low-dimensional and high-dimensional spaces, two neural network models with different levels of complexity were used.

The experiment with the MNIST dataset uses the LetNet-5 neural network model, which is relatively simple. The optimized number of hyperparameters is small. As an optimization process of HSSA in low-dimensional space, the experimental process is relatively simple. The experiment on the Five Flowers dataset uses a deep convolutional neural network with a complex structure and more parameters. As an optimization process of HSSA in high-dimensional space, the experimental process is relatively complex. The experimental results were used to compare HSSA with Random Search, Bayesian Optimization, CMA-ES, SA, and GA, which are all recognized as excellent methods in the field.

The settings of all algorithms are shown in Table 1.

4.2.1. Experiment on MNIST Dataset

MNIST [47,48] is a classic dataset used for simple classification problems in machine learning. The dataset consists of 70,000 handwritten digital grayscale images, which are divided into 10 categories including numbers 0–9. These 70,000 images are divided into training and verification sets. Some sample pictures in the MNIST dataset are shown in Figure 4.

Lecun evaluated hyperparameters in very high dimensions and found that their performance changes were only attributed to a few hyperparameters [45]. We selected six important hyperparameters of the LetNet-5 convolutional neural network as the optimization objects, including the number of F1 units, number of F2 units, L2 weight decay, batch size, learning rate, and dropout rate. The range of each hyperparameter is listed in Table 2.

Table 3 shows the parameter settings of the LetNet-5 convolutional neural network that remain unchanged during the simple network experiment.

A criterion of evaluating the algorithm performance in experiments is the verification error generated during the training when the training time is fixed [49]. Since the running time in the experiment is mainly consumed by neural network training, the running time of the optimization algorithm itself can be negligible. Let n be the run time of the neural network, where n = iterations × individuals. The complexity of the algorithm is O(n). Moreover, different algorithms require different training times for each iteration, so it is unreasonable to use only the number of iterations to represent the training time. Therefore, in order to ensure that the training time of each algorithm is the same, each algorithm during the experiment was set according to Table 4.

The error of each algorithm changes with the number of iterations is shown in Figure 5.

The comparison shows that HSSA is more effective than the other five algorithms at 280 iterations, and achieves the best effect at 530 iterations. HSSA has a strong global search capability of finding better hyperparameters than the other algorithms in simple neural networks. The experiment proves that HSSA performs well in a simple neural network.

4.2.2. Experiment on Five Flowers Dataset

The Five Flowers dataset (https://www.kaggle.com/alxmamaev/flowers-recognition, accessed on 3 April 2021) is a classic dataset used for complex classification problems in the machine learning field. The dataset consists of 3670 RGB images in 5 categories, including daisies, dandelions, roses, sunflowers, and tulips. These images are divided into a training set and validation set. Some sample pictures in the five flowers dataset are shown in Figure 6.

In this experiment, 11 important hyperparameters of complex convolutional neural networks were selected as optimization objects, including the number of F1 units, number of F2 units, number of F3 units, number of F4 units, L2 weight decay, batch size, learning rate, F1 dropout rate, F2 dropout rate, F3 dropout rate, and F4 dropout rate. The range of each hyperparameter is shown in Table 5.

Table 6 shows parameter settings of the complex convolutional neural network that remain unchanged during experiment 2.

Settings of each algorithm in the experiment are shown in Table 7.

The error of each algorithm varies with the number of iterations, as shown in Figure 7. Figure 7 shows that the performance of SA in the complex convolutional neural network model is unexpectedly poor. In order to test whether this situation is caused by errors in the experimental operation, the same experiment was carried out twice according to the same standard. Finally, we found that this situation is not caused by experimental operation errors, but by defects of SA itself. SA tends to fall into a local optimal solution and continues to oscillate around it during the search process. This shows that SA is not appropriate in high-dimensional spaces. Errors of the model optimized by the other four methods are all around 0.3. However, HSSA is still the best. Compared with the other excellent algorithms, HSSA shows a better global search capability and does not easily fall into local optimal solutions. HSSA can find better hyperparameters than the other algorithms in complex neural networks. The experiment proves that HSSA performs well in a complex neural network.

4.3. Meaning Verification

The meaning verification experiment compares the performance of HSSA, SSA, and PSO in neural networks, and analyzes the improvement. Although results of the performance verification experiment show that HSSA is superior to several other excellent algorithms, it does not prove meaningful improvements. In order to further verify the significance of the algorithm improvement, the meaning verification experiment was carried out.

The meaning verification experiment used the same neural networks model and datasets. We selected the original SSA and PSO as the comparison objects of HSSA. Algorithm settings during the experiment are shown in Table 8.

Results of the meaning verification experiment on the MNIST dataset are shown in Figure 8.

Figure 8 shows that the effect of HSSA surpasses that of the other two algorithms at 280 iterations, and the best effect is achieved at 530 iterations. Results of the meaning verification experiment on the Five Flowers dataset are shown in Figure 9.

Figure 9 shows that effects of SSA and PSO are almost the same. HSSA is more effective than the other two algorithms at 80 iterations and achieves the best effect at 470 iterations.

According to the experimental results for both MNIST and Five Flowers datasets, HSSA is better than SSA and PSO. It is proved that the optimization performance of the algorithm is greatly improved in both low-dimensional and high-dimensional spaces. It can be concluded that the improvement is of great significance.

4.4. Result Analysis

4.4.1. Optimization Effect Analysis

In order to further compare the eight methods used in experiments, experimental results are analyzed in Table 9 and Table 10 for the mean error, the minimum error, and the number of iterations required to reach the minimum error. The minimum error represents the ability of the algorithm to search for the optimal solution. A small minimum error indicates strong global search capability. The mean error represents the overall effect of the algorithm in the iterative process. It does not change dramatically for several generations. The number of iterations represents the speed of convergence. A small number of iterations indicates fast convergence.

From Table 9, it is observed that, except for the random search, the mean error of each algorithm is concentrated around 0.01, indicating that the overall effect of each algorithm is not significantly different. The minimum error of HSSA is more than 10.42% lower than that of the other classic algorithms, indicating that HSSA has a strong optimization performance. Compared with PSO and SSA, the minimum error of HSSA is reduced by 12.24% and 17.31%, indicating that our improvement is significant.

From Table 10, as SA tends to fall into a local optimal solution in high-dimensional problems, it is not suitable for complex neural networks. The mean error of HSSA is at least 4.34% lower than that of the other algorithms, which shows that HSSA has the best overall optimization result. The minimum error of HSSA is at least 8.16% lower than that of the other classic algorithms, indicating that the optimization of HSSA is excellent. The minimum error of HSSA is 16.82% and 16.37% lower than that of PSO and SSA, respectively, indicating that our improvement is significant.

In summary, HSSA has an excellent performance in both low-dimensional and high-dimensional spaces. Its performance has been greatly improved.

4.4.2. Global Search Capability Analysis

Although previous experiments prove the excellent performance of HSSA, they cannot prove the global optimization solution of the improved HSSA.

In order to verify the global search capability of the improved HSSA, further analysis was conducted, as shown in Table 11, Table 12, Table 13, Table 14, Table 15 and Table 16, which indicate the location of each individual in the population at the end of the algorithm operation. Small differences between individuals in the population indicate that the algorithm can easily fall into a local optimal solution. On the contrary, large differences between individuals indicate that the population contains more global information due to the global search ability [39]. In these tables, Bs is the batch size, L2 is L2 weight decay, F1–F4 is the number of F1–F4 units, Dr is the dropout rate, and Lr is the learning rate.

In order to visually compare the degree of dispersion of each group of data, their standard deviations were used as evaluation criteria. A large standard deviation indicates a high degree of dispersion [50]. Because these data are multi-dimensional, they can be regarded as discrete points in a high-dimensional space, and the difference between the data is the spatial distance between the points. For n points in the m-dimensional space, the standard deviation can be calculated using the following formulas:

|x_{a} - x_{b}| = \sqrt{\sum_{i = 1}^{n} {(x_{a}^{i} - x_{b}^{i})}^{2}}

(9)

\bar{x} = (\frac{1}{n} \sum_{i = 1}^{n} x_{i}^{1}, \frac{1}{n} \sum_{i = 1}^{n} x_{i}^{2}, \dots, \frac{1}{n} \sum_{i = 1}^{n} x_{i}^{m})

(10)

S = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(|x_{i} - \bar{x}|)}^{2}}

(11)

In addition, due to the huge difference in the value range of each parameter, using original values of parameters will cause different parameters to have different degrees of influence on the standard deviation. Therefore, each parameter needs to be processed, so the original values of each parameter were converted to its relative position within the value range [51]. The processing method is shown in Equation (12):

t^{'} = \frac{t - t_{\min}}{t_{\max} - t_{\min}}

(12)

where t is the original value of the parameter, and t′ is the relative position of the parameter. The converted relative positions are not affected by the value range. Each relative position has the same influence on the standard deviation.

Figure 10 shows the standard deviations when the iteration of PSO, SSA, and HSSA is completed in experiments for two different datasets.

Figure 10 shows the standard deviations of PSO, SSA, and HSSA on the MNIST dataset and Five Flowers dataset. It can be observed that regardless of experiments on the MNIST dataset or Five Flowers dataset, the standard deviation of PSO is very small. It means that PSO can easily fall into a local optimal solution as assumed. In experiments of the MNIST dataset and Five Flowers dataset, the standard deviation of HSSA is larger than that of PSO, by 0.5574 and 0.5323, respectively. In experiments of the MNIST dataset, the standard deviation of HSSA is 0.1174 higher than that of SSA. This shows that individuals of HSSA are even more scattered than those of SSA in experiments with the MNIST dataset. In experiments with the Five Flowers dataset, the standard deviation of HSSA is 0.085 lower than that of SSA, which is acceptable. In summary, the global search ability of HSSA could be verified by the experiment results.

4.5. Stability Analysis of HSSA

In the experiments, initial values of hyperparameters in optimization algorithms were randomly generated. Different initial values may lead to different experimental results. In order to verify the optimization ability of HSSA in the neural network under randomness, we conducted five experiments on MNIST and Five Flowers, respectively. The experiment settings were the same for both experiments. The mean and standard deviation of experimental results represent the effect of randomness on the HSSA performance.

Table 17 and Table 18 show the mean and standard deviation of the experimental results of HSSA in LetNet-5 and complex networks. Table 17 shows that the LetNet-5 model optimized by HSSA obtains a low error on the MNIST dataset. In addition, both average and minimum errors are very low. Standard deviations of the average and minimum errors are also very low. This shows that results of the five experiments are all very satisfactory, and the optimization performance of HSSA in a simple convolutional neural network model is very stable. Table 18 shows that the complex neural network optimized by HSSA has a gap between the mean and minimum errors on the Five Flowers dataset. The main reason for this phenomenon is that Five Flowers is a complex dataset, which is different from MNIST. A high classification accuracy cannot be achieved in the initial and middle stage optimization. Therefore, the final mean is high due to the high error in the initial and middle optimization stage. Even so, standard deviations of mean errors and minimum errors in the complex neural network are within an acceptable range. This shows that the optimization performance of HSSA in the complex convolutional neural network model is relatively stable.

Comparing Table 17 and Table 18 with Table 9 and Table 10, respectively, it can be seen that the minimum error of each experiment of HSSA is smaller than that of the other algorithms. This shows that the optimization performance of HSSA is not affected by randomness and verifies the stability of HSSA.

5. Discussion

Bayesian optimization, random search, and grid search are classic methods in neural network hyperparameter optimization. Although optimization methods based on heuristic algorithms and other methods proposed in recent years have also been used in the hyperparameter optimization of neural networks, they generally have poor global search capabilities and easily fall into local optimal solutions. Most of them are only suitable for solving a specific problem without universality. Therefore, the current heuristic algorithms are not effective for hyperparameter optimization. This paper improves heuristic algorithms for the hyperparameter optimization of neural networks. It provides a new research direction for solving problems of neural network hyperparameter optimization, which is to study new heuristic algorithms and apply them in the hyperparameter optimization.

The research in this paper validates the hybrid heuristic algorithm. The combination of different algorithms results in a new method that can benefit from the advantages of each algorithm. As a hybrid heuristic algorithm, HSSA embodies advantages of both PSO and SSA.

However, HSSA outperforms other methods when it iterates 280 times in the simple network. Although the optimal solution of HSSA is better than that of other methods, it requires sufficient iterations. In addition, HSSA contains some parameters that need to be manually set. If these parameters are not set properly, the algorithm may not be as effective as some traditional methods. This aspect also needs to be studied through a large number of experiments in future work.

HSSA is essentially an optimization method. It can also be used to solve other optimization problems. Moreover, as a novel heuristic algorithm, HSSA is a kind of swarm intelligence algorithm with potential parallelism, which has not been fully utilized. Due to the lack of study of the SSA, its true value has not yet been fully discovered. In future work we will further improve the SSA to take full advantage of its global search capability.

6. Conclusions

In order to improve neural network hyperparameter optimization for global search ability, we propose a new HSSA algorithm to avoid local optimal solutions. This algorithm fixes the data overflow defect of the original SSA and combines the advantages of the strong ability of SSA to find a global optimization solution with the high search speed of PSO. It performs well in both simple and complex networks.

The performance verification experiments on simple and complex networks prove that HSSA is an excellent hyperparameter optimization method. The minimum error of HSSA in simple networks is about 10% lower than that of the other classic algorithms. It is about 8% lower than that of the other classic algorithms in complex networks. Experimental results show that HSSA has excellent optimization performance to find better solutions than other algorithms.

Results of the meaning verification experiment on simple and complex networks prove the significance of HSSA’s improvement. By combining SSA and PSO, the data overflow problem caused by SSA is solved and the search speed is improved. The performance of the algorithm in neural networks is improved. In a simple network, the minimum error of HSSA is about 12% and 17% lower than that of PSO and SSA, respectively. In a complex network, the minimum error of HSSA is about 12% lower than that of PSO and SSA.

In addition, the stability analysis of HSSA proves that the optimization performance of HSSA is not affected by randomness, and shows that HSSA is stable and adaptable. In short, based on the research results of this article, HSSA is proven to be an excellent hyperparameter optimization algorithm.

However, this study has limitations. During the optimization process, the neural network may run hundreds or thousands of times. It takes hours to process a large dataset for some of the new network models proposed in recent years, which thus have a high time cost. Therefore, it is difficult to experiment with overly complex cases. In the future, the optimization algorithm will be applied to more complex situations with the continuous improvement in computer performance.

Although some achievements were made in this study, some future work needs to be undertaken. Firstly, in simple networks, the effect of HSSA is not significant when the number of iterations is small. Speeding up the early search speed will be a future research direction. Secondly, some parameters have an impact on the optimization performance of HSSA. Another future research direction will be to study setting appropriate parameters to improve the optimization performance. Thirdly, the combination of two algorithms provides a direction for new algorithms possessing the advantages of these two algorithms. Finally, limited by computer performance, current hyperparameter optimization algorithms are not suitable for particularly complex situations. Simplifying neural networks or improving computational efficiency will be a future research focus.

Author Contributions

Funding acquisition, B.G. and X.L.; Methodology, Y.F., Y.Z. and B.G.; Software, Y.Z.; Supervision, X.L., Q.P. and Z.J.; Writing—original draft, Y.F. and Y.Z.; Writing—review & editing, Y.Z., B.G., Q.P. and Z.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation of Hebei Province: F2020202103, the Natural Science Foundation of China (52175488) and the Scientific research youth top talent project of Hebei Province (BJ2021045). And The APC was funded by the Scientific research youth top talent project of Hebei Province (BJ2021045).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Acknowledgments

The authors would like to thank the School of Mechanical Engineering of Yanshan University for providing experimental conditions. Funding from the Natural Science Foundation of China (52175488) and the Scientific research youth top talent project of Hebei Province (BJ2021045) is gratefully acknowledged. The authors would like to thank the editors and the reviewers for reviewing this paper.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Gorshenin, A.; Kuzmin, V. Statistical Feature Construction for Forecasting Accuracy Increase and Its Applications in Neural Network Based Analysis. Mathematics 2022, 10, 589. [Google Scholar] [CrossRef]
Yuan, X.; Shi, J.; Gu, L. A review of deep learning methods for semantic segmentation of remote sensing imagery. Expert Syst. Appl. 2021, 169, 114417. [Google Scholar] [CrossRef]
Althubiti, S.A.; Escorcia-Gutierrez, J.; Gamarra, M.; Soto-Diaz, R.; Mansour, R.F.; Alenezi, F. Improved Metaheuristics with Machine Learning Enabled Medical Decision Support System. Comput. Mater. Contin. 2022, 73, 2423–2439. [Google Scholar] [CrossRef]
Xiong, J.; Zuo, M. What does existing NeuroIS research focus on? Inf. Syst. 2020, 89, 101462. [Google Scholar] [CrossRef]
Tantithamthavorn, C.; McIntosh, S.; Hassan, A.E.; Matsumoto, K. The Impact of Automated Parameter Optimization on Defect Prediction Models. IEEE Trans. Softw. Eng. 2019, 45, 683–711. [Google Scholar] [CrossRef] [Green Version]
Li, W.; Ng, W.W.Y.; Wang, T.; Pelillo, M.; Kwong, S. HELP: An LSTM-based approach to hyperparameter exploration in neural network learning. Neurocomputing 2021, 442, 161–172. [Google Scholar] [CrossRef]
van Rijn, J.N.; Hutter, F. Hyperparameter Importance Across Datasets. In Proceedings of the 24th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), London, UK, 19–23 August 2018; pp. 2367–2376. [Google Scholar] [CrossRef] [Green Version]
Wang, Z.; Xuan, J. Intelligent fault recognition framework by using deep reinforcement learning with one dimension convolution and improved actor-critic algorithm. Adv. Eng. Inform. 2021, 49, 101315. [Google Scholar] [CrossRef]
Larochelle, H.; Erhan, D.; Courville, A.; Bergstra, J.; Bengio, Y. An empirical evaluation of deep architectures on problems with many factors of variation. In Proceedings of the 24th International Conference on Machine Learning (ICML), Corvalis, OR, USA, 20–24 June 2007; pp. 473–480. [Google Scholar]
Lerman, P.M. Fitting Segmented Regression Models by Grid Search. J. R. Stat. Soc. Ser. C Appl. Stat. 1980, 29, 77–84. [Google Scholar] [CrossRef]
Bergstra, J.; Bengio, Y. Random Search for Hyper-Parameter Optimization. J. Mach. Learn. Res. 2012, 13, 281–305. [Google Scholar]
Frank, H.; Holger, H.H.; Kevin, L.B. Sequential Model-Based Optimization for General Algorithm Configuration. In Proceedings of the 5th International Conference on Learning and Intelligent Optimization, Rome, Italy, 17 January 2011; pp. 507–523. [Google Scholar] [CrossRef] [Green Version]
Talathi, S.S. Hyper-parameter optimization of deep convolutional networks for object recognition. In Proceedings of the 2015 IEEE International Conference on Image Processing (ICIP), Quebec City, QC, Canada, 27–30 September 2015; pp. 3982–3986. [Google Scholar]
Cui, J.; Tan, Q.; Zhang, C.; Yang, B. A novel framework of graph Bayesian optimization and its applications to real-world network analysis. Expert Syst. Appl. 2021, 170, 114524. [Google Scholar] [CrossRef]
Lee, M.; Bae, J.; Kim, S.B. Uncertainty-aware soft sensor using Bayesian recurrent neural networks. Adv. Eng. Inform. 2021, 50, 101434. [Google Scholar] [CrossRef]
Kong, H.; Yan, J.; Wang, H.; Fan, L. Energy management strategy for electric vehicles based on deep Q-learning using Bayesian optimization. Neural Comput. Appl. 2019, 32, 14431–14445. [Google Scholar] [CrossRef]
Jin, N.; Yang, F.; Mo, Y.; Zeng, Y.; Zhou, X.; Yan, K.; Ma, X. Highly accurate energy consumption forecasting model based on parallel LSTM neural networks. Adv. Eng. Inform. 2021, 51, 101442. [Google Scholar] [CrossRef]
Chanona, E.A.d.R.; Petsagkourakis, P.; Bradford, E.; Graciano, J.E.A.; Chachuat, B. Real-time optimization meets Bayesian optimization and derivative-free optimization: A tale of modifier adaptation. Comput. Chem. Eng. 2021, 147, 107249. [Google Scholar] [CrossRef]
Zhou, P.; El-Gohary, N. Semantic information alignment of BIMs to computer-interpretable regulations using ontologies and deep learning. Adv. Eng. Inform. 2021, 48, 101239. [Google Scholar] [CrossRef]
Sun, L.-X.; Xie, Y.; Song, X.-H.; Wang, J.-H.; Yu, R.-Q. Cluster analysis by simulated annealing. Comput. Chem. 1994, 18, 103–108. [Google Scholar] [CrossRef]
Zhang, Y.; Huang, G. Traffic flow prediction model based on deep belief network and genetic algorithm. IET Intell. Transp. Syst. 2018, 12, 533–541. [Google Scholar] [CrossRef]
Di Francescomarino, C.; Dumas, M.; Federici, M.; Ghidini, C.; Maggi, F.M.; Rizzi, W.; Simonetto, L. Genetic algorithms for hyperparameter optimization in predictive business process monitoring. Inf. Syst. 2018, 74, 67–83. [Google Scholar] [CrossRef]
Perera, R.; Guzzetti, D.; Agrawal, V. Optimized and autonomous machine learning framework for characterizing pores, particles, grains and grain boundaries in microstructural images. Comput. Mater. Sci. 2021, 196, 110524. [Google Scholar] [CrossRef]
Lorenzo, P.R.; Nalepa, J.; Ramos, L.S.; Pastor, J.R. Hyper-parameter selection in deep neural networks using parallel particle swarm optimization. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), Berlin, Germany, 15–19 July 2017; pp. 1864–1871. [Google Scholar] [CrossRef]
Djenouri, Y.; Srivastava, G.; Lin, J.C.-W. Fast and Accurate Convolution Neural Network for Detecting Manufacturing Data. IEEE Trans. Ind. Inform. 2021, 17, 2947–2955. [Google Scholar] [CrossRef]
Öztürk, M.M.; Cankaya, I.A.; Ipekçi, D. Optimizing echo state network through a novel fisher maximization based stochastic gradient descent. Neurocomputing 2020, 415, 215–224. [Google Scholar] [CrossRef]
Hu, T.; Khishe, M.; Mohammadi, M.; Parvizi, G.-R.; Karim, S.H.T.; Rashid, T.A. Real-time COVID-19 diagnosis from X-Ray images using deep CNN and extreme learning machines stabilized by chimp optimization algorithm. Biomed. Signal Process. Control 2021, 68, 102764. [Google Scholar] [CrossRef]
Kalita, D.J.; Singh, V.P.; Kumar, V. A dynamic framework for tuning SVM hyper parameters based on Moth-Flame Optimization and knowledge-based-search. Expert Syst. Appl. 2021, 168, 114139. [Google Scholar] [CrossRef]
Cervantes, J.; Garcia-Lamont, F.; Rodríguez-Mazahua, L.; Lopez, A. A comprehensive survey on support vector machine classification: Applications, challenges and trends. Neurocomputing 2020, 408, 189–215. [Google Scholar] [CrossRef]
Wu, C.; Khishe, M.; Mohammadi, M.; Karim, S.H.T.; Rashid, T.A. Evolving deep convolutional neutral network by hybrid sine–cosine and extreme learning machine for real-time COVID19 diagnosis from X-ray images. Soft Comput. 2021, 1–20. [Google Scholar] [CrossRef]
Wang, X.; Gong, C.; Khishe, M.; Mohammadi, M.; Rashid, T.A. Pulmonary Diffuse Airspace Opacities Diagnosis from Chest X-Ray Images Using Deep Convolutional Neural Networks Fine-Tuned by Whale Optimizer. Wirel. Pers. Commun. 2022, 124, 1355–1374. [Google Scholar] [CrossRef]
Yutong, G.; Khishe, M.; Mohammadi, M.; Rashidi, S.; Nateri, M.S. Evolving Deep Convolutional Neural Networks by Extreme Learning Machine and Fuzzy Slime Mould Optimizer for Real-Time Sonar Image Recognition. Int. J. Fuzzy Syst. 2021, 24, 1371–1389. [Google Scholar] [CrossRef]
Khishe, M.; Caraffini, F.; Kuhn, S. Evolving Deep Learning Convolutional Neural Networks for Early COVID-19 Detection in Chest X-ray Images. Mathematics 2021, 9, 1002. [Google Scholar] [CrossRef]
Chen, F.; Yang, C.; Khishe, M. Diagnose Parkinson’s disease and cleft lip and palate using deep convolutional neural networks evolved by IP-based chimp optimization algorithm. Biomed. Signal Process. Control 2022, 77, 103688. [Google Scholar] [CrossRef]
Yang, X.-S.; Deb, S. Cuckoo search: Recent advances and applications. Neural Comput. Appl. 2014, 24, 169–174. [Google Scholar] [CrossRef] [Green Version]
Ozcan, T.; Basturk, A. Transfer learning-based convolutional neural networks with heuristic optimization for hand gesture recognition. Neural Comput. Appl. 2019, 31, 8955–8970. [Google Scholar] [CrossRef]
Freitas, D.; Lopes, L.G.; Morgado-Dias, F. Particle Swarm Optimisation: A Historical Review Up to the Current Developments. Entropy 2020, 22, 362. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Xue, J.; Shen, B. A novel swarm intelligence optimization approach: Sparrow search algorithm. Syst. Sci. Control Eng. 2020, 8, 22–34. [Google Scholar] [CrossRef]
Lynn, N.; Suganthan, P.N. Ensemble particle swarm optimizer. Appl. Soft Comput. 2017, 55, 533–548. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
Gašperov, B.; Begušić, S.; Šimović, P.P.; Kostanjčar, Z. Reinforcement Learning Approaches to Optimal Market Making. Mathematics 2021, 9, 2689. [Google Scholar] [CrossRef]
Wu, X.; Sahoo, D.; Hoi, S.C.H. Recent advances in deep learning for object detection. Neurocomputing 2020, 396, 39–64. [Google Scholar] [CrossRef] [Green Version]
Trappey, C.V.; Trappey, A.J.C.; Lin, S.C.-C. Intelligent trademark similarity analysis of image, spelling, and phonetic features using machine learning methodologies. Adv. Eng. Inform. 2020, 45, 101120. [Google Scholar] [CrossRef]
Escalante, H.J. Automated Machine Learning—A Brief Review at the End of the Early Years. In Automated Design of Machine Learning and Search Algorithms; Natural Computing Series; Pillay, N., Qu, R., Eds.; Springer: Cham, Switzerland, 2021; pp. 11–28. [Google Scholar] [CrossRef]
Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef] [Green Version]
Schneider, P.; Biehl, M.; Hammer, B. Hyperparameter learning in probabilistic prototype-based models. Neurocomputing 2010, 73, 1117–1124. [Google Scholar] [CrossRef] [Green Version]
Baldominos, A.; Saez, Y.; Isasi, P. A Survey of Handwritten Character Recognition with MNIST and EMNIST. Appl. Sci. 2019, 9, 3169. [Google Scholar] [CrossRef] [Green Version]
Kido, D.; Fukuda, T.; Yabuki, N. Assessing future landscapes using enhanced mixed reality with semantic segmentation by deep learning. Adv. Eng. Inform. 2021, 48, 101281. [Google Scholar] [CrossRef]
Omri, M.; Abdel-Khalek, S.; Khalil, E.M.; Bouslimi, J.; Joshi, G.P. Modeling of Hyperparameter Tuned Deep Learning Model for Automated Image Captioning. Mathematics 2022, 10, 288. [Google Scholar] [CrossRef]
Quiroz, J.; Baumgartner, R. Interval Estimations for Variance Components: A Review and Implementations. Stat. Biopharm. Res. 2019, 11, 162–174. [Google Scholar] [CrossRef]
Zhang, J.; Kou, G.; Peng, Y.; Zhang, Y. Estimating priorities from relative deviations in pairwise comparison matrices. Inf. Sci. 2021, 552, 310–327. [Google Scholar] [CrossRef]

Figure 1. Steps of HSSA.

Figure 2. LetNet-5 neural network model.

Figure 3. Complex neural network model.

Figure 4. MNIST dataset example.

Figure 5. Result of the performance verification experiment on MNIST.

Figure 6. Five Flowers dataset example.

Figure 7. Result of the performance verification experiment on Five Flowers.

Figure 8. Results of the meaning verification experiment on MNIST.

Figure 9. Results of the meaning verification experiment on Five Flowers.

Figure 10. Standard deviations of PSO, SSA, and HSSA.

Table 1. Settings of all algorithms.

Method	Setup
Random search	Completely random
Bayesian Optimization	Tree Parzen Estimator Gaussian process EI function
CMA-ES	Initial step: σ⁽⁰⁾ = 0.618 (ub-lb) Initial evolutionary path: p_σ⁽⁰⁾ = 0, p_c⁽⁰⁾ = 0 Initial covariance matrix: C = I
SA	Initial temperature: T₀ = 100 Descent rate: α = 0.99
GA	Variation rate: P_m = 0.2 Roulette wheel selection
PSO	Inertia weight: ω = 0.6 Learning factors: c₁ = 2, c₂ = 2
SSA	Discoverer ratio: 20% Detective ratio: 10% Alert threshold: 0.8
HSSA	Discoverer ratio: 20% Detective ratio: 10% Alert threshold: 0.8 Inertia weight: ω = 0.6 Learning factors: c₁ = 2, c₂ = 2

Table 2. Range of hyperparameters to be optimized on MNIST.

Name	Range
Number of F1 units	128–1024
Number of F2 units	128–1024
L2 weight decay	0.0001–0.01
Batch size	16–128
Learning rate	0.0001–0.01
Dropout rate	0.1–0.5

Table 3. Parameter settings of the LetNet-5 convolutional neural network that remain unchanged.

Name	Value
Epochs	10
Input	Shape: 28 × 28; Dimensions: 1
Convolution layer 1	Size: 5 × 5; Strides: 1
Pooling layer 1	Size: 2 × 2; Strides: 2
Convolution layer 2	Size: 5 × 5; Strides: 1
Pooling layer 2	Size: 2 × 2; Strides: 2
Activation function	Relu; Softmax

Table 4. Settings of each algorithm in the performance verification experiment on MNIST.

Method	Setup
Random search	700 iterations
Bayesian Optimization	700 iterations
CMA-ES	700 iterations
SA	700 iterations
GA	50 initial individuals; 700 generations
HSSA	10 individuals per generation; 70 generations

Table 5. The range of hyperparameters to be optimized on Five Flowers.

Name	Range
Number of F1 units	128–1024
Number of F2 units	128–1024
Number of F3 units	128–1024
Number of F4 units	128–1024
L2 weight decay	0.0001–0.01
Batch size	16–128
Learning rate	0.0001–0.01
F1 Dropout rate	0.1–0.5
F2 Dropout rate	0.1–0.5
F3 Dropout rate	0.1–0.5
F4 Dropout rate	0.1–0.5

Table 6. Parameter settings of the complex convolutional neural network that remain unchanged.

Name	Value
Epochs	20
Input	Shape: 32 × 32; Dimension: 3
Convolution layer 1	Size: 3 × 3; Strides: 2
Pooling layer 1	Size: 2 × 2; Strides: 2
Convolution layer 2	Size: 3 × 3; Strides: 2
Pooling layer 2	Size: 2 × 2; Strides: 2
Convolution layer 3	Size: 3 × 3; Strides: 1
Convolution layer 4	Size: 3 × 3; Strides: 1
Convolution layer 5	Size: 3 × 3; Strides: 1
Pooling layer 3	Size: 2 × 2; Strides: 2
Activation function	Relu; Softmax

Table 7. Settings of each algorithm in the experiment on Five Flowers.

Method	Setup
Random search	700 iterations
Bayesian Optimization	700 iterations
CMA-ES	700 iterations
SA	700 iterations
GA	50 initial individuals; 700 generations
HSSA	10 individuals per generation; 70 generations

Table 8. Settings of each algorithm in the meaning verification experiment.

Method	Setup
SSA	10 individuals per generation; 70 generations
PSO	10 individuals per generation; 70 generations
HSSA	10 individuals per generation; 70 generations

Table 9. Results analysis on MNIST.

Method	Mean Error	Minimum Error	Number of Iterations
Random search	0.0119	0.0115	277
Bayesian Optimization	0.0100	0.0097	62
CMA-ES	0.0107	0.0102	612
SA	0.0099	0.0096	68
GA	0.0109	0.0107	404
PSO	0.0102	0.0098	330
SSA	0.0106	0.0104	230
HSSA	0.0097	0.0086	530

Table 10. Results analysis on Five Flowers.

Method	Mean Error	Minimum Error	Number of Iterations
Random search	0.3537	0.3148	312
Bayesian Optimization	0.2837	0.2692	371
CMA-ES	0.2895	0.2830	580
SA	0.7555	0.7555	1
GA	0.3250	0.2869	570
PSO	0.3113	0.2973	90
SSA	0.3063	0.2957	190
HSSA	0.2714	0.2473	460

Table 11. Positions of individuals of PSO in the experiment on MNIST.

	Bs	L2	F1	F2	Dr	Lr
1	127	0.0032	716	340	0.10	0.0008
2	127	0.0032	716	340	0.10	0.0008
3	127	0.0032	716	340	0.10	0.0008
4	127	0.0032	716	340	0.10	0.0008
5	127	0.0032	716	340	0.10	0.0008
6	127	0.0032	716	340	0.10	0.0008
7	127	0.0032	716	340	0.10	0.0008
8	127	0.0032	716	340	0.10	0.0008
9	128	0.0017	521	409	0.17	0.0029
10	98	0.0035	1003	507	0.24	0.0014

Table 12. Positions of individuals of SSA in the experiment on MNIST.

	Bs	L2	F1	F2	Dr	Lr
1	128	0.0100	1024	1024	0.50	0.0100
2	107	0.0084	954	973	0.41	0.0094
3	99	0.0097	1017	912	0.47	0.0086
4	68	0.0006	579	1015	0.15	0.0059
5	97	0.0029	629	283	0.49	0.0031
6	85	0.0061	358	420	0.30	0.0089
7	61	0.0065	690	421	0.36	0.0099
8	51	0.0074	466	212	0.34	0.0011
9	91	0.0066	306	677	0.29	0.0055
10	27	0.0081	447	379	0.35	0.0004

Table 13. Positions of individuals of HSSA in the experiment on MNIST.

	Bs	L2	F1	F2	Dr	Lr
1	16	0.0001	128	128	0.10	0.0001
2	21	0.0001	132	130	0.11	0.0001
3	19	0.0008	146	183	0.13	0.0001
4	17	0.0077	838	959	0.16	0.0066
5	72	0.0020	310	914	0.34	0.0089
6	123	0.0066	436	498	0.28	0.0041
7	45	0.0036	451	180	0.23	0.0003
8	111	0.0037	314	924	0.14	0.0024
9	116	0.0058	1019	658	0.45	0.0009
10	42	0.0046	957	614	0.43	0.0024

Table 14. Positions of individuals of PSO in the experiment on Five Flowers.

	Bs	L2	F1	F2	F3	F4	Dr1	Dr2	Dr3	Dr4	Lr
1	68	0.0010	544	137	241	232	0.50	0.50	0.50	0.50	0.0010
2	68	0.0010	544	137	241	232	0.50	0.50	0.50	0.50	0.0010
3	68	0.0010	544	137	241	232	0.50	0.50	0.50	0.50	0.0010
4	68	0.0010	544	137	241	232	0.50	0.50	0.50	0.50	0.0010
5	68	0.0010	544	137	241	232	0.50	0.50	0.50	0.50	0.0010
6	68	0.0010	544	137	241	232	0.50	0.50	0.50	0.50	0.0010
7	68	0.0010	544	137	241	232	0.50	0.50	0.50	0.50	0.0010
8	83	0.0010	472	189	380	176	0.45	0.43	0.49	0.48	0.0010
9	49	0.0038	718	444	426	264	0.34	0.40	0.37	0.39	0.0007
10	72	0.0027	385	235	192	370	0.25	0.38	0.32	0.47	0.0015

Table 15. Positions of individuals of SSA in the experiment on Five Flowers.

	Bs	L2	F1	F2	F3	F4	Dr1	Dr2	Dr3	Dr4	Lr
1	61	0.0012	128	184	184	184	0.48	0.47	0.50	0.47	0.0014
2	119	0.0001	153	455	616	510	0.26	0.40	0.35	0.25	0.0022
3	78	0.0008	997	208	482	1010	0.39	0.23	0.20	0.16	0.0002
4	61	0.0015	630	798	176	703	0.47	0.48	0.47	0.46	0.0090
5	66	0.0048	832	745	389	446	0.48	0.43	0.18	0.28	0.0046
6	67	0.0060	432	944	593	631	0.27	0.37	0.22	0.24	0.0002
7	64	0.0097	771	199	326	454	0.43	0.10	0.14	0.32	0.0093
8	21	0.0020	598	480	466	830	0.43	0.17	0.49	0.49	0.0030
9	51	0.0005	426	737	957	870	0.36	0.25	0.31	0.15	0.0094
10	43	0.0057	610	722	810	908	0.49	0.45	0.11	0.27	0.0008

Table 16. Positions of individuals of the HSSA in the experiment on Five Flowers.

	Bs	L2	F1	F2	F3	F4	Dr1	Dr2	Dr3	Dr4	Lr
1	128	0.0001	572	497	555	222	0.41	0.41	0.41	0.41	0.0010
2	35	0.0021	354	500	573	193	0.43	0.30	0.33	0.35	0.0080
3	106	0.0032	181	326	481	655	0.23	0.26	0.48	0.13	0.0034
4	55	0.0040	138	714	244	265	0.13	0.42	0.31	0.13	0.0059
5	28	0.0059	294	858	276	150	0.22	0.34	0.17	0.14	0.0007
6	63	0.0047	856	520	211	285	0.49	0.17	0.47	0.27	0.0008
7	23	0.0096	241	575	647	966	0.39	0.37	0.21	0.19	0.0026
8	43	0.0070	715	410	489	378	0.49	0.14	0.24	0.45	0.0068
9	74	0.0031	263	618	652	158	0.34	0.30	0.40	0.25	0.0081
10	28	0.0028	922	505	162	617	0.34	0.48	0.15	0.12	0.0028

Table 17. Mean and standard deviation of 5 experiments on MNIST.

Experiment	Mean Error	Minimum Error
Experiment 1	0.0095	0.0089
Experiment 2	0.0104	0.0092
Experiment 3	0.0094	0.0085
Experiment 4	0.0099	0.0081
Experiment 5	0.0108	0.0093
Mean Value	0.0100	0.0088
Standard Deviation	0.00053	0.00048

Table 18. Mean and standard deviation of 5 experiments on Five Flowers.

Experiment	Mean Error	Minimum Error
Experiment 1	0.2697	0.2457
Experiment 2	0.2821	0.2472
Experiment 3	0.2749	0.2464
Experiment 4	0.2701	0.2403
Experiment 5	0.2663	0.2396
Mean Value	0.2726	0.2438
Standard Deviation	0.00548	0.00322

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Fan, Y.; Zhang, Y.; Guo, B.; Luo, X.; Peng, Q.; Jin, Z. A Hybrid Sparrow Search Algorithm of the Hyperparameter Optimization in Deep Learning. Mathematics 2022, 10, 3019. https://doi.org/10.3390/math10163019

AMA Style

Fan Y, Zhang Y, Guo B, Luo X, Peng Q, Jin Z. A Hybrid Sparrow Search Algorithm of the Hyperparameter Optimization in Deep Learning. Mathematics. 2022; 10(16):3019. https://doi.org/10.3390/math10163019

Chicago/Turabian Style

Fan, Yanyan, Yu Zhang, Baosu Guo, Xiaoyuan Luo, Qingjin Peng, and Zhenlin Jin. 2022. "A Hybrid Sparrow Search Algorithm of the Hyperparameter Optimization in Deep Learning" Mathematics 10, no. 16: 3019. https://doi.org/10.3390/math10163019

APA Style

Fan, Y., Zhang, Y., Guo, B., Luo, X., Peng, Q., & Jin, Z. (2022). A Hybrid Sparrow Search Algorithm of the Hyperparameter Optimization in Deep Learning. Mathematics, 10(16), 3019. https://doi.org/10.3390/math10163019

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Hybrid Sparrow Search Algorithm of the Hyperparameter Optimization in Deep Learning

Abstract

1. Introduction

2. Related Research

2.1. Hyperparameter Optimization

2.2. Particle Swarm Optimization (PSO) and Sparrow Search Algorithm (SSA)

3. Proposed Approach

3.1. Hybrid Sparrow Search Algorithm (HSSA)

3.2. Fitness Function

4. Experiments

4.1. Convolutional Neural Network

4.2. Performance Verification

4.2.1. Experiment on MNIST Dataset

4.2.2. Experiment on Five Flowers Dataset

4.3. Meaning Verification

4.4. Result Analysis

4.4.1. Optimization Effect Analysis

4.4.2. Global Search Capability Analysis

4.5. Stability Analysis of HSSA

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI