Optimal Agent Search Using Surrogate-Assisted Genetic Algorithms
Abstract
:1. Introduction
1.1. Motivation
1.2. Contribution
1.3. Organization
2. Related Work
2.1. Surrogate-Assisted Genetic Algorithm
2.2. Neural Network Optimization of Agents
3. Materials and Methods
3.1. SGA
3.1.1. Encoding and Fitness
3.1.2. Selection and Crossover
Algorithm 1 Extended-box Crossover |
Extended-box Crossover (p1, p2) L:= Chromosome length for i ← 1 to L m ← min(p1i, p2i), M ← max(p1i, p2i) em ← m-α(M-m), eM ← M+α(M-m) zi ← a random real number in [max(em, minθ∈Θ θi), min(eM, maxθ∈Θ θi)] //Θ: set of training networks //θi: the i-th value of θ encoded by Section 3.1.1 return Z = (z1, z2, …, zL) |
3.1.3. Mutation and Replacement
Algorithm 2 Mutation |
Mutation (Chromosome) L:= Chromosome length p:= Mutation rate for i ← 1 to L if (random [0, 1) < p) Chromosome[i] ← random[minθ∈Θ θi, maxθ∈Θ θi] |
3.1.4. Surrogate Model with Stepwise Improvement
- Heuristic Measure using Multiple Surrogates: The advantage of using ensembles that employ multiple surrogates is that it is possible to address the degradation problem of a single surrogate, and the predicted performance variance helps avoid false optima [28]. In this study, we designed multiple homogeneous surrogates that combine classifiers and regressors designed using the same training data. The fitness function of the proposed surrogate is expressed as the sum of the predicted values of the regressor and output of the classifiers that reflects the penalty weights. The decision boundary of the classifiers is determined by the cumulative reward quality of an agent based on the problem. The heuristic measure that returns the results for the input network θ is expressed as Equation (2). c1 and c2 are classifiers for the input network θ, and have an output value of 1 or 0. r is a regressor and outputs the cumulative reward prediction for θ. pc1 and pc2 are negative penalty constants such that .
- Data Standardization: Because this study collected experimental points through a gradient-based, we obtained numerous low-quality outliers prior to the convergence to the optimal solution. To prevent performance degradation of the surrogate owing to this problem, we calculated the average m and standard deviation σ for each agent’s DNN parameter and adjusted the input data to have a standard normal distribution. The standardized value Z for the input random variable X is calculated as follows:
- Random Sampling using PCA: Training data used for designing surrogate models for optimizing DNNs of agents are limited and highly dimensional. This could not sufficiently represent the problem domain, which may deteriorate the quality of surrogate-assisted computation. Data sampling was applied to address this problem, reduce biases, and increase robustness to outliers [29]. However, in the case of higher dimensions, the domain is very wide and it is difficult to add samples. In this study, we reduced the dimensions through PCA [30] and added experimental points randomly and uniformly in the reduced domains. PCA analyzes the correlation between DNN parameters and extracts the principal components representing them, thereby minimizing information loss in high-dimensional spaces and performing dimension reduction that reflects the domain. Figure 2 shows an example of data sampling after 2D reduction through PCA, wherein Figure 2a shows a scatter plot of training data before sampling and Figure 2b shows the scatter plot after performing random data sampling through PCA. From Figure 2b, it can be qualitatively confirmed that the cluster is well-formed according to the decision boundary of the classifier presented in the heuristic measure using multiple surrogates.
3.2. Experiments
3.2.1. Objective for Optimal Agent Search
3.2.2. Cart-Pole Balancing Problem
- Surrogate Modeling Data: For designing a surrogate model, pairs of DNN parameters of agents and cumulative reward data according to the DNNs are required. The experimental points were collected through DDQN, which is an off-policy DRL technique. To improve the encoding and computational speed of the SGA, we set it to have a simple DNN architecture. The minimal topology for collecting good solutions was 1 hidden layer and 3 hidden nodes. To measure the performance of the proposed SGAs by network topologies, we set up 4 architecture cases as follows: the number of hidden layers is 1 or 2, and the number of hidden nodes in each layer is 3 or 5. The network collected by DDQN is a fully connected network, and the number of the input and output nodes of cart-pole balancing is 4 and 2, respectively. Therefore, the number of parameters per architecture case are 23 (), 37 (), 35 (), and 67 (), respectively. A total of 10,000 data were collected through DDQN learning.
- 2.
- Surrogate-assisted GA for the Cart-pole Balancing Problem: Following the stepwise improvements proposed in Section 3.1.4, we designed surrogate models using the surrogate modeling data. In Step 1, two classifiers and one regressor were combined into multiple heuristic surrogates. The classifiers were designed as DNN-based, and each decision boundary was set as 195 and the median value of the experimental points exceeding 195. The first decision boundary of the classifiers, 195, was the threshold of the previous version, CartPole-v0. The regressor of the initial experiment was a DNN that is a nonlinear method commonly used in surrogates. However, in the GA performance to be described later, DNN cannot search the solution that allows it to achieve the maximum reward of 500 for all network architectures. To improve our SGAs, we replaced the regressor with support vector regression (SVR). SVR is a method that applies support vector machine to regression, and it adjusts the regression line to adding as much data as possible inside the margin. In addition, SVR is robust to outliers and overfitting through an ε-incentive loss function. In Step 2, the learning data of the surrogate was adjusted through standardization. In Step 3, the domain of the surrogate was reduced to 2D through PCA, and 10,000 experimental points were added with the same number as the training data.
3.2.3. Lunar Lander
4. Results and Discussion
4.1. Surrogate Modeling Data
4.2. Surrogate-Assisted GA for the Cart-Pole Balancing Problem
4.3. Results for the Lunar Lander Problem
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A. Experiments for Surrogate-Assisted Genetic Algorithms According to the Number of Surrogates Samples in the Cart-Pole Balancing
The Number of Samples | Fitness | Simulation | ||||
---|---|---|---|---|---|---|
Best | Average | STD. | Best | Average | STD. | |
1000 | 554.068 | 551.114 | 2.077 | 500.000 | 437.360 | 138.333 |
3000 | 611.015 | 605.590 | 3.745 | 500.000 | 467.872 | 95.159 |
6000 | 548.681 | 543.419 | 2.957 | 500.000 | 469.741 | 82.721 |
10,000 | 578.489 | 574.012 | 2.166 | 500.000 | 500.000 | 0.000 |
Appendix B. Results of the Proposed Surrogates on Each Network Architecture with Stepwise Improvement in the Cart-Pole Balancing
Step 1-DNN | ||||||||||||||||
Network | (1, 3)/23 | (1, 5)/37 | (2, 3, 3)/35 | (2, 5, 5,)/67 | ||||||||||||
Measurement | RMSE | MPE | MAPE | Coef. | RMSE | MPE | MAPE | Coef. | RMSE | MPE | MAPE | Coef. | RMSE | MPE | MAPE | Coef. |
Test | 93.17 | −101.45 | 120.85 | 0.75 | 93.52 | −52.90 | 70.39 | 0.71 | 82.89 | −41.53 | 60.48 | 0.85 | 90.79 | −29.88 | 46.44 | 0.82 |
Train | 90.02 | −86.51 | 106.05 | 0.77 | 82.83 | −45.58 | 61.20 | 0.78 | 76.57 | −39.54 | 56.16 | 0.87 | 71.74 | −22.42 | 35.92 | 0.90 |
Step 1-SVR | ||||||||||||||||
Network | (1, 3)/23 | (1, 5)/37 | (2, 3, 3)/35 | (2, 5, 5,)/67 | ||||||||||||
Measurement | RMSE | MPE | MAPE | Coef. | RMSE | MPE | MAPE | Coef. | RMSE | MPE | MAPE | Coef. | RMSE | MPE | MAPE | Coef. |
Test | 105.99 | −95.89 | 120.37 | 0.67 | 104.70 | −42.78 | 64.27 | 0.63 | 98.72 | −51.71 | 69.44 | 0.79 | 99.72 | −41.02 | 54.95 | 0.79 |
Train | 105.29 | −80.65 | 106.10 | 0.68 | 100.95 | −37.04 | 57.84 | 0.66 | 98.09 | −49.99 | 67.32 | 0.80 | 92.54 | −38.60 | 51.81 | 0.82 |
Step 2-SVR | ||||||||||||||||
Network | (1, 3)/23 | (1, 5)/37 | (2, 3, 3)/35 | (2, 5, 5,)/67 | ||||||||||||
Measurement | RMSE | MPE | MAPE | Coef. | RMSE | MPE | MAPE | Coef. | RMSE | MPE | MAPE | Coef. | RMSE | MPE | MAPE | Coef. |
Test | 88.92 | −72.31 | 95.11 | 0.78 | 103.02 | −52.18 | 80.33 | 0.72 | 103.00 | −34.63 | 59.08 | 0.77 | 78.88 | −46.47 | 66.19 | 0.91 |
Train | 83.70 | −65.83 | 89.29 | 0.79 | 102.12 | −48.24 | 77.56 | 0.75 | 102.55 | −34.22 | 57.86 | 0.78 | 77.48 | −45.11 | 64.04 | 0.92 |
Step 3-SVR | ||||||||||||||||
Network | (1, 3)/23 | (1, 5)/37 | (2, 3, 3)/35 | (2, 5, 5,)/67 | ||||||||||||
Measurement | RMSE | MPE | MAPE | Coef. | RMSE | MPE | MAPE | Coef. | RMSE | MPE | MAPE | Coef. | RMSE | MPE | MAPE | Coef. |
Test | 85.31 | −65.79 | 87.02 | 0.79 | 84.04 | −53.74 | 80.78 | 0.82 | 92.84 | −27.47 | 47.75 | 0.82 | 75.16 | −42.36 | 60.92 | 0.92 |
Train | 80.73 | −59.10 | 81.09 | 0.79 | 82.43 | −50.60 | 78.17 | 0.81 | 92.40 | −27.49 | 46.78 | 0.82 | 72.26 | −41.16 | 58.71 | 0.92 |
References
- Stuart, J.R.; Peter, N. Artificial Intelligence: A Modern Approach, 4th ed.; Prentice Hall: Hoboken, NJ, USA, 2020. [Google Scholar]
- Otterlo, M.V.; Wiering, M. Reinforcement learning and Markov decision processes. In Reinforcement Learning; Wiering, M., van Otterlo, M., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; Volume 12, pp. 3–42. [Google Scholar] [CrossRef]
- Such, F.P.; Madhavan, V.; Conti, E.; Lehman, J.; Stanley, K.O.; Clune, J. Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. arXiv 2017, arXiv:1712.06567. [Google Scholar]
- Katoch, S.; Chauhan, S.S.; Kumar, V. A review on genetic algorithm: Past, present, and future. Multimed. Tools Appl. 2021, 80, 8091–9126. [Google Scholar] [CrossRef] [PubMed]
- Mirjalili, S. Genetic Algorithm. In Evolutionary Algorithms and Neural Networks; Springer: Cham, Germany, 2019; pp. 43–55. [Google Scholar] [CrossRef]
- Jin, Y. Surrogate-assisted evolutionary computation: Recent advanced and future challenges. Swarm Evol. Comput. 2011, 1, 61–70. [Google Scholar] [CrossRef]
- Pei, Y.; Gao, H.; Han, X. A Surrogate Model Based Genetic Algorithm for Complex Problem Solving. In Proceedings of the 6th Annual International Conference on Network and Information Systems for Computers (ICNISC2020), Guiyang, China, 14–15 August 2020. [Google Scholar]
- Cho, D.H.; Moon, S.H.; Kim, Y.H. Genetic feature selection applied to KOSPI and cryptocurrency price prediction. Mathematics 2021, 9, 2574. [Google Scholar] [CrossRef]
- Kim, Y.H.; Yoon, Y.; Kim, Y.H. Towards a better basis search through a surrogate model-based epistasis minimization for pseudo-Boolean optimization. Mathematics 2020, 8, 1287. [Google Scholar] [CrossRef]
- Cho, H.Y.; Kim, Y.H. A Genetic Algorithm to Optimize SMOTE and GAN Ratios in Class Imbalanced Datasets. In Proceedings of the Genetic and Evolutionary Computation Conference Companion, Cancun, Mexico, 8–12 July 2020; pp. 33–34. [Google Scholar] [CrossRef]
- Cai, X.; Gao, L.; Li, X. Efficient generalized surrogate-assisted evolutionary algorithm for high-dimensional expensive problems. IEEE Trans. Evol. Comput. 2020, 24, 365–379. [Google Scholar] [CrossRef]
- Francon, O.; Gonzalez, S.; Hodjat, B.; Meyerson, E.; Miikkulainen, R.; Qiu, X.; Shahrzad, H. Effective Reinforcement Learning through Evolutionary Surrogate-Assisted Prescription. In Proceedings of the Genetic and Evolutionary Computation Conference, Cancun, Mexico, 8–12 July 2020; pp. 814–822. [Google Scholar] [CrossRef]
- Yu, D.P.; Kim, Y.-H. Predictability on Performance of Surrogate-Assisted Evolutionary Algorithm According to Problem Dimension. In Proceedings of the Genetic and Evolutionary Computation Conference Companion, Prague, Czech Republic, 13–17 July 2019; pp. 91–92. [Google Scholar] [CrossRef]
- Jin, Y. A comprehensive survey of fitness approximation in evolutionary computation. Soft Comput. 2005, 9, 3–12. [Google Scholar] [CrossRef] [Green Version]
- Calisto, M.B.; Lai-Yuen, S.K. EMONAS-Net: Efficient multiobjective neural architecture search using surrogate-assisted evolutionary algorithm for 3D medical image segmentation. Artif. Intell. Med. 2019, 119, 102154. [Google Scholar] [CrossRef] [PubMed]
- Sun, Y.; Wang, H.; Xue, B.; Jin, Y.; Ten, G.G.; Zhang, M. Surrogate-assisted evolutionary deep learning using an end-to-end random forest-based performance predictor. IEEE Tran. Evol. Comput. 2019, 24, 350–364. [Google Scholar] [CrossRef]
- Pholdee, N.; Baek, H.M.; Bureerat, S.; Im, Y.T. Process optimization of a non-circular drawing sequence based on multi-surrogate assisted meta-heuristic algorithms. J. Mech. Sci. Technol. 2015, 29, 3427–3436. [Google Scholar] [CrossRef]
- Vincenzi, L.; Gambarelli, P. A proper infill sampling strategy for improving the speed performance of a surrogate-assisted evolutionary algorithm. Comput. Struct. 2017, 178, 58–70. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 1998. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing Atari with deep reinforcement learning. arXiv 2013, arXiv:1312.5602. [Google Scholar]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with A Stochastic Actor. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 1861–1870. [Google Scholar]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
- Hausknecht, M.; Lehman, J.; Miikkulainen, R.; Stone, P. A neuroevolution approach to general Atari game playing. IEEE Tran. Comput. Intell. AI 2014, 6, 355–366. [Google Scholar] [CrossRef] [Green Version]
- Stanley, K.O.; Clune, J.; Lehman, H.; Miikkulainen, R. Designing neural networks through neuroevoultion. Nat. Mach. Intell. 2019, 1, 24–35. [Google Scholar] [CrossRef] [Green Version]
- Van Hasselt, H.; Guez, A.; Silver, D. Deep Reinforcement Learning with Double Q-Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 14–17 February 2016; pp. 2094–2100. [Google Scholar] [CrossRef]
- Clune, J.; Stanley, K.O.; Pennock, R.T.; Ofria, C. On the performance of indirect encoding across the continuum of regularity. IEEE Trans. Evol. Comput. 2011, 15, 346–367. [Google Scholar] [CrossRef]
- Yoon, Y.; Kim, Y.H.; Moraglio, A.; Moon, B.R. A theoretical and empirical study on unbiased boundary-extended crossover for real-valued representation. Inf. Sci. 2012, 183, 48–65. [Google Scholar] [CrossRef] [Green Version]
- Shin, S.S.; Kim, Y.H. A surrogate model-based genetic algorithm for the optimal policy in cart-pole balancing environments. In Proceedings of the Genetic and Evolutionary Computation Conference Companion, Boston, MA, USA, 9–13 July 2022; pp. 503–505. [Google Scholar] [CrossRef]
- Zheng, N.; Wang, H.; Yuan, B. An adaptive model switch-based surrogate-assisted evolutionary algorithm for noisy expensive multi-objective optimization. Complex Intell. Syst. 2022, 8, 4339–4356. [Google Scholar] [CrossRef]
- Shaukat, S.S.; Rao, T.A.; Khan, M.A. Impact of sample size on principal component analysis ordination of an environmental data set: Effects on eigenstructure. Ekológia 2016, 35, 173. [Google Scholar] [CrossRef]
Environment Parts | Value |
---|---|
Action space | Discrete (2) |
Observation shape | Continuous (4) |
Continuous observation range | ±[2.4, ∞, 24, ∞] |
Operator/Parameter | Value |
---|---|
Population size | 100 |
Number of generations | 1000 |
Chromosome lengths | 23/37/35/67 |
Selection method | Roulette-wheel |
Crossover method | Extended-box |
Mutation rate | 1.5% |
Replacement | Elitism & Generational |
CPU (core) | AMD Ryzen Threadripper 2990WX (32) |
RAM | 64 GB |
Operating system | Ubuntu 18.04 LTS |
Programming language (version) | Python (3.7.11) |
Environments Parts | Value |
---|---|
Action space | Discrete (4) |
Observation shape | Mixed (Continuous: 6; Bool: 2) |
Continuous observation range | ±[1.5, 1.5, 5, 5, 3.14, 5] |
Surrogate Model | Network | Fitness | Simulation | |||||
---|---|---|---|---|---|---|---|---|
Best | Average | STD | Best | Average | STD | p-Value | ||
DNN step 1 | (1, 3) | 762.602 | 748.419 | 77.334 | 9.480 | 9.359 | 0.070 | - |
(1, 5) | 2636.006 | 2414.450 | 412.272 | 88.470 | 11.990 | 14.202 | - | |
(2, 3, 3) | 1936.146 | 1893.751 | 32.744 | 27.290 | 13.729 | 5.578 | - | |
(2, 5, 5) | 13,117.208 | 12,286.410 | 295.666 | 9.480 | 9.345 | 0.080 | - | |
SVR step 1 | (1, 3) | 682.976 | 682.510 | 0.276 | 9.570 | 9.357 | 0.075 | 7.56 × 10−1 |
(1, 5) | 991.123 | 920.751 | 151.260 | 500.000 | 333.722 | 184.831 | 2.46 × 10−11 | |
(2, 3, 3) | 969.271 | 942.010 | 92.769 | 500.000 | 42.093 | 122.381 | 4.62 × 10−3 | |
(2, 5, 5) | 637.732 | 614.532 | 12.041 | 500.000 | 165.670 | 167.801 | 7.66 × 10−10 | |
SVR step 2 | (1, 3) | 649.203 | 649.139 | 0.044 | 13.820 | 11.937 | 0.773 | 2.98 × 10−11 |
(1, 5) | 585.734 | 573.516 | 3.355 | 500.000 | 500.000 | 0.000 | 5.37 × 10−6 | |
(2, 3, 3) | 574.514 | 557.821 | 47.189 | 500.000 | 469.507 | 73.103 | 5.29 × 10−9 | |
(2, 5, 5) | 534.788 | 514.015 | 6.391 | 500.000 | 327.571 | 199.072 | 7.00 × 10−4 | |
SVR step 3 | (1, 3) | 649.197 | 649.145 | 0.036 | 25.850 | 13.065 | 2.677 | 9.26 × 10−3 |
(1, 5) | 578.489 | 574.012 | 2.166 | 500.000 | 500.000 | 0.000 | N.A. | |
(2, 3, 3) | 574.635 | 567.215 | 17.360 | 500.000 | 491.200 | 20.261 | 1.22 × 10−1 | |
(2, 5, 5) | 534.798 | 510.536 | 6.585 | 500.000 | 259.955 | 201.400 | 1.80 × 10−1 |
Best Reward | Mean Reward | ||
---|---|---|---|
Baseline | 303.452 | 216.66 ± 5.056 | |
Proposed SGAs | Best Mean | 308.930 | 224.665 ± 5.657 |
290.352 | 155.297 ± 8.047 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Shin, S.-S.; Kim, Y.-H. Optimal Agent Search Using Surrogate-Assisted Genetic Algorithms. Mathematics 2023, 11, 230. https://doi.org/10.3390/math11010230
Shin S-S, Kim Y-H. Optimal Agent Search Using Surrogate-Assisted Genetic Algorithms. Mathematics. 2023; 11(1):230. https://doi.org/10.3390/math11010230
Chicago/Turabian StyleShin, Seung-Soo, and Yong-Hyuk Kim. 2023. "Optimal Agent Search Using Surrogate-Assisted Genetic Algorithms" Mathematics 11, no. 1: 230. https://doi.org/10.3390/math11010230
APA StyleShin, S. -S., & Kim, Y. -H. (2023). Optimal Agent Search Using Surrogate-Assisted Genetic Algorithms. Mathematics, 11(1), 230. https://doi.org/10.3390/math11010230