Bio-Inspired Optimization Algorithm Associated with Reinforcement Learning for Multi-Objective Operating Planning in Radioactive Environment
Abstract
:1. Introduction
- A more complicated multi-objective operating planning problem model in the radiation environment is constructed compared to [6]. Specifically, this model considers the operating difficulty level at each operating point ignored entirely in [6], which influences the time to complete each operating task and then the cumulative radiation dose. Therefore, this newly constructed model is closer to the engineering practice.
- A combinatorial algorithm framework consisting of the bio-inspired optimal algorithm and reinforcement learning is provided, where the hyper-parameters of GA, including crossover probability, mutation probability, and population size, can be adjusted by the RL during the iterative process in order to solve the VTSP more efficiently.
- Comparative tests between the proposed HPAGA and several classical evolutionary computing algorithms in terms of solving different TSP instances with diverse scales are conducted to demonstrate the superior performance of the proposed hybrid algorithm.
2. Related Work
3. Problem Formulation
3.1. Radiation Dose Model
3.2. VTSP Formulation
4. Proposed HPAGA
4.1. Algorithm Framework
- Step 1: The agent obtains the current state from GA by calculating the population fitness in a designed way. The regulation of the state space formulation will be expatiated in the following passage.
- Step 2: HPAGA selects and executes the corresponding action according to the action selection policy in reinforcement learning and then adjusts the crossover rate, mutation rate, and population size of the current GA.
- Step 3: Execute the GA with the updated crossover rate, mutation rate, and population size to reach the new state .
- Step 4: Calculate the reward from state to state . The reward estimation method will be introduced in the following passage.
- Step 5: Update knowledge of the agent according to states , , reward , and action selection policy by Q-learning.
4.2. Genetic Algorithm
4.3. Multi-Parameter Adaptive Reinforcement Learning
5. Experimental Results
5.1. Experimental Setup
5.2. Ablation Experiment
- Analyzing the comparative results of HPAGA_c and GA, HPAGA_c obtains lower average costs than GA all over the four instances, with fewer crossover operations. This indicates that dynamically adjusting the crossover rate alone can propagate superior genes and improve the overall fitness of the population, then enhancing the performance of GA.
- Based on the comparative results of HPAGA_m and GA on the four instances, HPAGA_m accomplishes lower minimum costs than GA on att48, berlin52, and eil101 instances, with a fewer number of mutation operations. However, on the st70 instance, HPAGA_m’s minimum and average costs are worse than GA’s. This implies that dynamically adjusting the mutation rate alone can increase population diversity and enhance genetic algorithm performance, but it can also have potentially negative effects due to the influence of mutated individuals in the population.
- Reviewing the comparative results of HPAGA_p and GA, HPAGA_p acquires lower minimum and average costs than GA in all instances, which demonstrates that the population size agent is effective in improving the classical GA.
- Examining the results of HPAGA_cm, HPAGA_cm realizes lower minimum and average costs than GA, with fewer crossover and mutation operations. Compared to HPAGA_m, HPAGA_cm reaches a better balance while dynamically adjusting both crossover and mutation rates, promoting population diversity and mitigating the potential negative effects of mutated individuals by propagating superior genes.
- Among all the comparative algorithms, HPAGA achieves the best performance in most comparative indicators, including the lowest costs and the smallest standard deviation. Note that Figure 5 demonstrates that HPAGA also has the fastest convergence speed.
5.3. Comparative Analysis
5.4. Limitations
6. Case Study in Simulated Radioactive Scenario
7. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
References
- Rehm, T.E. Advanced nuclear energy: The safest and most renewable clean energy. Curr. Opin. Chem. Eng. 2023, 39, 100878. [Google Scholar] [CrossRef]
- Zhang, D.; Yin, Y.; Luo, R.; Zou, S. Hybrid IACO-A*-PSO optimization algorithm for solving multiobjective path planning problem of mobile robot in radioactive environment. Prog. Nucl. Energy 2023, 159, 104651. [Google Scholar] [CrossRef]
- Pentreath, R.J. Radiological protection, radioecology, and the protection of animals in high-dose exposure situations. J. Environ. Radioact. 2023, 270, 107270. [Google Scholar] [CrossRef]
- Adibel, J.O.; Liu, Y.; Ayodeji, A.; Awodi, N.J. Path planning in nuclear facility decommissioning: Research status, challenges, and opportunities. Nucl. Eng. Technol. 2021, 53, 3505–3516. [Google Scholar] [CrossRef]
- Wang, Z.; Cai, J. The path-planning in radioactive environment of nuclear facilities using an improved particle swarm optimization algorithm. Nucl. Eng. Des. 2018, 326, 79–86. [Google Scholar] [CrossRef]
- Xie, X.; Tang, Z.; Cai, J. The multi-objective inspection path-planning in radioactive environment based on an improved ant colony optimization algorithm. Prog. Nucl. Energy 2022, 144, 104076. [Google Scholar] [CrossRef]
- Wu, Z.; Yin, Y.; Liu, J.; Zhang, D.; Chen, J.; Jiang, W. A novel path planning approach for mobile robot in radioactive environment based on improved deep Q network algorithm. Symmetry 2023, 15, 2048. [Google Scholar] [CrossRef]
- Liu, Y.; Li, M.; Xie, C.; Peng, M.; Wang, S.; Chao, N.; Liu, Z. Minimum dose method for walking-path planning of nuclear facilities. Ann. Nucl. Energy 2015, 83, 161–171. [Google Scholar] [CrossRef]
- Chao, N.; Liu, Y.; Xia, H.; Ayodeji, A.; Bai, L. Grid-based RRT* for minimum dose walking path-planning in complex radioactive environments. Ann. Nucl. Energy 2018, 115, 73–82. [Google Scholar] [CrossRef]
- Zhang, D.; Luo, R.; Yin, Y.; Zou, S. Multi-objective path planning for mobile robot in nuclear accident environment based on improved ant colony optimization with modified A*. Nucl. Eng. Technol. 2023, 55, 1838–1854. [Google Scholar] [CrossRef]
- Lee, M.; Jang, S.; Cho, W.; Lee, J.; Lee, C.; Kim, S.H. A proposal on multi-agent static path planning strategy for minimizing radiation dose. Nucl. Eng. Technol. 2024, 56, 92–99. [Google Scholar] [CrossRef]
- Helsgaun, K. An effective implementation of the Lin-Kernighan traveling salesman heuristic. Eur. J. Oper. Res. 2000, 126, 106–130. [Google Scholar] [CrossRef]
- Toaza, B.; Esztergár-Kiss, D. A review of metaheuristic algorithms for solving TSP-based scheduling optimization problems. Eur. J. Oper. Res. 2023, 148, 110908. [Google Scholar] [CrossRef]
- Applegate, D.L.; Bixby, R.E.; Chvatal, V.; Cook, W.J. The Traveling Salesman Problem: A Computational Study; Princeton University Press: Princeton, NJ, USA, 2006. [Google Scholar]
- Pan, X.; Jin, Y.; Ding, Y.; Feng, M.; Zhao, L.; Song, L.; Bian, J. H-TSP: Hierarchically solving the large-scale travelling salesman problem. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023. [Google Scholar]
- Zheng, J.; He, K.; Zhou, J.; Jin, Y.; Li, C. Reinforced Lin–Kernighan–Helsgaun algorithms for the traveling salesman problems. Knowl.-Based Syst. 2023, 260, 110144. [Google Scholar] [CrossRef]
- Valdez, F.; Moreno, F.; Melin, P. A comparison of ACO, GA and SA for solving the TSP problem. Hybrid Intell. Syst. Control. Pattern Recognit. Med. 2020, 181–189. [Google Scholar]
- Bao, X.; Wang, G.; Xu, L.; Wang, Z. Solving the min-max clustered traveling salesmen problem based on genetic algorithm. Biomimetics 2023, 8, 238. [Google Scholar] [CrossRef] [PubMed]
- Panwar, K.; Deep, K. Transformation operators based grey wolf optimizer for travelling salesman problem. J. Comput. Sci. 2021, 55, 101454. [Google Scholar] [CrossRef]
- Mzili, T.; Mzili, I.; Riffi, M.E. Artificial rat optimization with decision-making: A bio-inspired metaheuristic algorithm for solving the traveling salesman problem. Decis. Mak. Appl. Manag. Eng. 2023, 6, 150–176. [Google Scholar] [CrossRef]
- Poornima, B.S.; Sarris, I.E.; Chandan, K.; Nagaraja, K.V.; Kumar, R.S.V.; Ben Ahmed, S. Evolutionary computing for the radiative–convective heat transfer of a wetted wavy fin using a genetic algorithm-based neural network. Biomimetics 2023, 8, 574. [Google Scholar] [CrossRef]
- Mahmoudinazlou, S.; Kwon, C. A hybrid genetic algorithm for the min–max multiple traveling salesman problem. Comput. Oper. Res. 2024, 162, 106455. [Google Scholar] [CrossRef]
- Zheng, J.; Zhong, J.; Chen, M.; He, K. A reinforced hybrid genetic algorithm for the traveling salesman problem. Comput. Oper. Res. 2023, 157, 106249. [Google Scholar] [CrossRef]
- Chen, R.; Yang, B.; Li, S.; Wang, S. A self-learning genetic algorithm based on reinforcement learning for flexible job-shop scheduling problem. Comput. Ind. Eng. 2020, 149, 106778. [Google Scholar] [CrossRef]
- Mazyavkina, N.; Sviridov, S.; Ivanov, S.; Burnaev, E. Reinforcement learning for combinatorial optimization: A survey. Comput. Oper. Res. 2021, 134, 105400. [Google Scholar] [CrossRef]
- Dou, X.; Yang, Q.; Gao, X.; Lu, Z.; Zhang, J. A comparative study on crossover operators of genetic algorithm for traveling salesman problem. In Proceedings of the 15th International Conference on Advanced Computational Intelligence (ICACI), Seoul, Republic of Korea, 6–9 May 2023. [Google Scholar]
- Reinelt, G. TSPLIB-A traveling salesman problem library. ORSA J. Comput. 1991, 3, 376–384. [Google Scholar] [CrossRef]
- Alipour, M.M.; Razavi, S.N.; Derakhshi, M.R.F.; Balafar, M.A. A hybrid algorithm using a genetic algorithm and multiagent reinforcement learning heuristic to solve the traveling salesman problem. Neural Comput. Appl. 2018, 30, 2935–2951. [Google Scholar] [CrossRef]
- Yasear, S.A.; Ku-Mahamud, K.R. Fine-tuning the ant colony system algorithm through Harris’s hawk optimizer for travelling salesman problem. Int. J. Intell. Eng. Syst. 2021, 14, 136–145. [Google Scholar] [CrossRef]
- Hammouri, A.I.; Samra, E.T.A.; Al-Betar, M.A.; Khalil, R.M.; Alasmer, Z.; Kanan, M. A dragonfly algorithm for solving traveling salesman problem. In Proceedings of the IEEE International Conference on Control System, Computing and Engineering, Penang, Malaysia, 23–25 November 2018. [Google Scholar]
- Hatamlou, A. Solving travelling salesman problem using black hole algorithm. Soft Comput. 2018, 22, 8167–8175. [Google Scholar] [CrossRef]
Instance | Index | GA | HPAGA_c | HPAGA_m | HPAGA_p | HPAGA_cm | HPAGA |
---|---|---|---|---|---|---|---|
att48 (33,523) 1 | Best | 34,350 | 33,900 | 34,143 | 33,639 | 33,929 | 33,601 |
Worst | 37,462 | 37,754 | 39,199 | 37,256 | 37,566 | 37,136 | |
Mean | 35,492 | 35,314 | 35,740 | 35,206 | 35,499 | 35,188 | |
Std | 740 | 818 | 1075 | 766 | 771 | 730 | |
Num_c | 64,350 | 58,290 | 64,350 | 136,844 | 58,148 | 128,976 | |
Num_m | 9876 | 9889 | 9604 | 21,070 | 9589 | 18,784 | |
berlin52 (7542) 1 | Best | 8104 | 7550 | 7618 | 7544 | 7544 | 7544 |
Worst | 12,398 | 8791 | 9371 | 8648 | 8782 | 8894 | |
Mean | 10,068 | 8242 | 8320 | 8213 | 8275 | 8116 | |
Std | 1433 | 270 | 320 | 252 | 289 | 285 | |
Num_c | 64,351 | 58,374 | 64,366 | 139,140 | 58,412 | 130,578 | |
Num_m | 9928 | 9908 | 9604 | 21,406 | 9586 | 19,236 | |
st70 (675) 1 | Best | 739 | 714 | 764 | 687 | 756 | 683 |
Worst | 927 | 986 | 908 | 766 | 857 | 778 | |
Mean | 825 | 803 | 828 | 732 | 797 | 735 | |
Std | 53 | 51 | 42 | 18 | 29 | 24 | |
Num_c | 64,412 | 58,250 | 64,354 | 139,125 | 58,444 | 131,202 | |
Num_m | 9910 | 9883 | 9466 | 21,397 | 9530 | 19,287 | |
eil76 (545) 1 | Best | 636 | 589 | 611 | 567 | 591 | 565 |
Worst | 930 | 708 | 731 | 618 | 743 | 620 | |
Mean | 732 | 652 | 664 | 596 | 654 | 599 | |
Std | 77 | 25 | 35 | 12 | 34 | 12 | |
Num_c | 64,307 | 58,341 | 64,343 | 137,773 | 58,614 | 131,330 | |
Num_m | 9899 | 9916 | 9414 | 21,169 | 9529 | 19,151 | |
gr96 (512) 1 | Best | 764 | 636 | 619 | 565 | 647 | 542 |
Worst | 1243 | 890 | 915 | 632 | 792 | 644 | |
Mean | 901 | 733 | 760 | 603 | 722 | 598 | |
Std | 142 | 53 | 66 | 16 | 37 | 24 | |
Num_c | 64,383 | 58,172 | 64,333 | 138,313 | 58,281 | 129,260 | |
Num_m | 9909 | 9911 | 9372 | 21,300 | 9412 | 18,848 | |
eil101 (629) 1 | Best | 834 | 778 | 832 | 700 | 796 | 713 |
Worst | 1343 | 1064 | 1178 | 774 | 993 | 765 | |
Mean | 1067 | 873 | 944 | 741 | 865 | 736 | |
Std | 136 | 57 | 89 | 14 | 43 | 14 | |
Num_c | 64,336 | 58,108 | 64,301 | 138,910 | 58,330 | 131,227 | |
Num_m | 9874 | 9916 | 9355 | 21,360 | 9503 | 19,322 |
Instance | Method | Best |
---|---|---|
att48 (33,523) 1 | GA | 34,350 |
HPAGA | 33,601 | |
ACO | 35,231 | |
PSO | 36,996 | |
BH | 34,201 | |
DA | 37,226 | |
berlin52 (7542) 1 | GA | 8104 |
HPAGA | 7544 | |
ACO | 7757 | |
PSO | 9218 | |
BH | 8188 | |
DA | 9401 | |
st70 (675) 1 | GA | 739 |
HPAGA | 683 | |
ACO | 712 | |
PSO | 1031 | |
BH | 723 | |
DA | 797 | |
eil76 (545) 1 | GA | 639 |
HPAGA | 565 | |
ACO | 574 | |
PSO | 804 | |
BH | 566 | |
DA | 625 | |
gr96 (512) 1 | GA | 764 |
HPAGA | 542 | |
ACO | 556 | |
PSO | 1095 | |
BH | 547 | |
DA | 671 | |
eil101 (629) 1 | GA | 834 |
HPAGA | 713 | |
ACO | 725 | |
PSO | 1159 | |
BH | 720 | |
DA | 813 |
Pos. (m) | ||||||||||
CT (hour) | ||||||||||
Pos. (m) | ||||||||||
CT (hour) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kong, S.; Wu, F.; Liu, H.; Zhang, W.; Sun, J.; Wang, J.; Yu, J. Bio-Inspired Optimization Algorithm Associated with Reinforcement Learning for Multi-Objective Operating Planning in Radioactive Environment. Biomimetics 2024, 9, 438. https://doi.org/10.3390/biomimetics9070438
Kong S, Wu F, Liu H, Zhang W, Sun J, Wang J, Yu J. Bio-Inspired Optimization Algorithm Associated with Reinforcement Learning for Multi-Objective Operating Planning in Radioactive Environment. Biomimetics. 2024; 9(7):438. https://doi.org/10.3390/biomimetics9070438
Chicago/Turabian StyleKong, Shihan, Fang Wu, Hao Liu, Wei Zhang, Jinan Sun, Jian Wang, and Junzhi Yu. 2024. "Bio-Inspired Optimization Algorithm Associated with Reinforcement Learning for Multi-Objective Operating Planning in Radioactive Environment" Biomimetics 9, no. 7: 438. https://doi.org/10.3390/biomimetics9070438
APA StyleKong, S., Wu, F., Liu, H., Zhang, W., Sun, J., Wang, J., & Yu, J. (2024). Bio-Inspired Optimization Algorithm Associated with Reinforcement Learning for Multi-Objective Operating Planning in Radioactive Environment. Biomimetics, 9(7), 438. https://doi.org/10.3390/biomimetics9070438