Developing a Container Ship Loading-Planning Program Using Reinforcement Learning
Abstract
:1. Introduction
1.1. Research Background
1.2. Previous Research
- A method is proposed to divide the process into Phase 1 and Phase 2 to establish an optimal stowage plan efficiently and stably for a 3D loading space.
- This study is the first to propose the application of the PPO algorithm for the container-stowage planning problem considering bay, row, and tier simultaneously.
2. Reinforcement Learning
2.1. Overview of Reinforcement Learning
2.2. Proximal Policy Optimization Algorithm
3. Applying Reinforcement Learning for Phase 1—Bay Plan
3.1. Problem Definition
3.2. Definition of Input Variables and Parameters
3.3. Application of Reinforcement Learning
3.4. Development of a Stowage Model Using the PPO Algorithm
3.5. Phase 1 Training Results
4. Application of Reinforcement Learning for Minimizing Relocations in Phase 2—Row, Tier Plan
4.1. Problem Definition
4.2. Definition of Input Variables and Parameters
4.3. Application of Reinforcement Learning
4.4. Development of a Stowage Model Using the PPO Algorithm
4.5. Training Results of the PPO Algorithm in Phase 2
4.6. Performance Comparison with the DQN Algorithm
5. Container Stowage Planning with Weight Constraints Added in Phase 2
5.1. Problem Definition
5.2. Definition of Input Variables and Parameters
5.3. Application of Weight Constraints in Phase 2
5.4. Model Training Results
6. Application of Phase 1 and Phase 2 in a 3D Stowage Space
6.1. Problem Definition
6.2. Stowage Planning
7. Program Development
7.1. Program Overview
7.2. Program Graphical User Interface
7.3. Execution Result
8. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
Appendix B
Algorithm A1: Container Stowage Planning using Proximal Policy Optimization (PPO) |
Input:
Output:
|
### Phase 1: Bay Selection ### 1: Initialize policy network πθ and value network Vφ with random weights 2: For each iteration do 3: For each container i = 1 to N do 4: Observe current state s_i (bay capacities, container weights, ship’s LCG) 5: Select action a_i (choose bay) based on πθ(s_i) 6: Execute action a_i (stow container in the selected bay) 7: Compute reward r_i based on the stowage (e.g., LCG balance, bay capacity) 8: Store transition (s_i, a_i, r_i, s_i+1) in memory 9: End for ### Phase 2: Row and Tier Selection ### 10: for each container i = 1 to N do 11: Observe current state s_i (row and tier capacities, POD numbers) 12: Select action a_i (choose row and tier) based on πθ(s_i) 13: Execute action a_i (stow container in the selected row and tier) 14: Compute reward r_i based on rehandling avoidance and weight distribution 15: Store transition (s_i, a_i, r_i, s_i+1) in memory 16: End for ### Training and Optimization ### 17: Compute advantages A_i using rewards and value network Vφ 18: For each mini-batch of data do 19: Compute the ratio r = πθ(new action)/πθ(old action) 20: Clip the ratio r to [1 − ε, 1 + ε] to avoid large policy updates 21: Update policy network πθ using a clipped loss function 22: Update value network Vφ using mean squared error between predicted and actual returns 23: End for 24: Apply updates to policy network πθ and value network Vφ 25: End for |
26: Output the optimized stowage plan |
References
- Hong, D.H. The Method of Container Loading Scheduling through Hierarchical Clustering. J. Korea Soc. Comput. Inf. 2005, 10, 201–208. [Google Scholar]
- Park, Y.K.; Kwak, K.S. Export container preprocessing method to decrease the number of rehandling in container terminal. J. Navig. Port Res. 2011, 35, 77–82. [Google Scholar] [CrossRef]
- Akyüz, M.H.; Lee, C.Y. A Mathematical Formulation and Efficient Heuristics for the Dynamic Container Relocation Problem. Nav. Res. Logist. 2014, 61, 101–118. [Google Scholar] [CrossRef]
- Dubrovsky, O.; Levitin, G.; Penn, M. A Genetic Algorithm with a Compact Solution Encoding for the Container Ship Stowage Problem. J. Heuristics 2002, 8, 585–599. [Google Scholar] [CrossRef]
- Zhu, H. Integrated Containership Stowage Planning: A Methodology for Coordinating Containership Stowage Plan and Terminal Yard Operations. Sustainability 2022, 14, 13376. [Google Scholar] [CrossRef]
- Chen-Fu, C.; Yu-Bin, L.; Kanchana, S.; Chia-Ching, P. Digital system for dynamic container loading with neural network-based memory exploiting hybrid genetic algorithm for carbon reduction. Comput. Ind. Eng. 2024, 191, 110149. [Google Scholar] [CrossRef]
- Chang, Y.; Hamedi, M.; Haghani, A. Solving integrated problem of stowage planning with crane split by an improved genetic algorithm based on novel encoding mode. Meas. Control 2023, 56, 172–191. [Google Scholar] [CrossRef]
- Junqueira, C.; Azevedo, A.T.; Ohishi, T. Solving the integrated multi-port stowage planning and container relocation problems with a genetic algorithm and simulation. Appl. Sci. 2022, 12, 8191. [Google Scholar] [CrossRef]
- Wang, Y.; Shi, G.; Hirayama, K. Many-Objective Container Stowage Optimization Based on Improved NSGA-III. J. Mar. Sci. Eng. 2022, 10, 517. [Google Scholar] [CrossRef]
- Parreño, F.; Pacino, D.; Alvarez-Valdes, R. A GRASP Algorithm for the Container Stowage Slot Planning Problem. Transp. Res. Part E 2016, 94, 141–157. [Google Scholar] [CrossRef]
- Wei, L.; Wie, F.; Schmitz, S.; Kunal, K.; Noche, B. Optimization of Container Relocation Problem via Reinforcement Learning. Logist. J. Proc. 2021, 2021. [Google Scholar] [CrossRef]
- Ling, Y.; Wang, Q.; Pan, L. Advancing multi-port container stowage efficiency: A novel DQN-LNS algorithmic solution. Knowl.-Based Syst. 2024, 299, 112074. [Google Scholar] [CrossRef]
- Chen, P.; Wang, Q. Learning for multiple purposes: A Q-learning enhanced hybrid metaheuristic for parallel drone scheduling traveling salesman problem. Comput. Ind. Eng. 2024, 187, 109851. [Google Scholar] [CrossRef]
- Karimi-Mamaghan, M.; Mohammadi, M.; Pasdeloup, B.; Meyer, P. Learning to select operators in meta-heuristics: An integration of Q-learning into the iterated greedy algorithm for the permutation flowshop scheduling problem. Eur. J. Oper. Res. 2023, 304, 1296–1330. [Google Scholar] [CrossRef]
- Jiang, T.; Zeng, B.; Wang, Y.; Yan, W. A new heuristic reinforcement learning for container relocation problem. J. Phys. Conf. Ser. 2021, 1873, 012050. [Google Scholar] [CrossRef]
- Shen, Y.; Zhao, N.; Xia, M.; Du, X. A Deep Q-Learning Network for Ship Stowage Planning Problem. Pol. Marit. Res. 2017, 24, 102–109. [Google Scholar] [CrossRef]
- Shin, J.Y.; Ryu, H.S. Deep Q-Learning Network Model for Container Ship Master Stowage Plan. J. Korean Soc. Ind. Converg. 2021, 24, 19–29. [Google Scholar] [CrossRef]
- Jeon, D.; Kim, G.; Lim, C.; Shin, S. Container Stowage Plan to Reduce Shifts Based on Reinforcement Learning. In Proceedings of the Korean Society of Ocean Science and Technology Conference, Jeju, Republic of Korea, 2 June 2022; pp. 515–521. [Google Scholar]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-Level Control Through Deep Reinforcement Learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
Condition | Reward |
---|---|
+1 | |
−2 | |
−500 | |
+1 | |
−2 | |
+1 | |
−2 | |
+1 | |
−2 |
Parameter | Value |
---|---|
Learning rate | 0.0001 |
Gamma | 0.9 |
Lambda | 3 |
Clipping | 0.1 |
Number of Nodes | 128, 64, 32 |
Optimizer | Adam |
Parameter | Value |
---|---|
Learning rate | 0.0001 |
Gamma | 0.9 |
Lambda | 3 |
Clipping | 0.1 |
Number of Nodes | 128, 64, 32 |
Optimizer | Adam |
Parameter | Value |
---|---|
Learning rate | 0.0001 |
Gamma | 0.9 |
Lambda | 3 |
Clipping | 0.1 |
Number of Nodes | 128, 64, 32 |
Optimizer | Adam |
Rehandling Occurs: O, Rehandling Does Not Occur: X | |||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Case | Container List (POD NO) | Rehandling Status | |||||||||||||||
1 | 3 | 2 | 2 | 1 | 3 | 2 | 2 | 1 | 3 | 2 | 2 | 1 | 3 | 3 | 2 | 1 | X |
2 | 3 | 3 | 3 | 3 | 3 | 3 | 2 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | 1 | X |
3 | 4 | 3 | 2 | 2 | 4 | 4 | 3 | 3 | 4 | 3 | 2 | 1 | 4 | 3 | 2 | 1 | X |
4 | 4 | 3 | 3 | 1 | 4 | 3 | 2 | 1 | 3 | 2 | 2 | 1 | 3 | 3 | 3 | 2 | X |
5 | 4 | 4 | 4 | 4 | 4 | 4 | 3 | 3 | 2 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | X |
6 | 4 | 4 | 2 | 1 | 4 | 4 | 2 | 1 | 4 | 3 | 2 | 1 | 4 | 3 | 2 | 1 | X |
7 | 4 | 3 | 3 | 3 | 3 | 2 | 2 | 2 | 4 | 4 | 3 | 3 | 2 | 2 | 1 | 1 | X |
8 | 3 | 2 | 2 | 3 | 4 | 4 | 3 | 3 | 2 | 2 | 3 | 3 | 1 | 1 | 1 | 1 | X |
9 | 4 | 3 | 2 | 1 | 4 | 3 | 2 | 1 | 4 | 3 | 2 | 1 | 4 | 3 | 2 | 1 | X |
Rehandling Occurs: O, Rehandling Does Not Occur: X | |||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Case | Container List (POD NO) | Rehandling Status | |||||||||||||||
1 | 3 | 2 | 2 | 1 | 3 | 2 | 2 | 1 | 3 | 2 | 2 | 1 | 3 | 3 | 2 | 1 | X |
2 | 3 | 3 | 3 | 3 | 3 | 3 | 2 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | 1 | X |
3 | 4 | 3 | 2 | 2 | 4 | 4 | 3 | 3 | 4 | 3 | 2 | 1 | 4 | 3 | 2 | 1 | X |
4 | 4 | 3 | 3 | 1 | 4 | 3 | 2 | 1 | 3 | 2 | 2 | 1 | 3 | 3 | 3 | 2 | X |
5 | 4 | 4 | 4 | 4 | 4 | 4 | 3 | 3 | 2 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | X |
6 | 4 | 4 | 2 | 1 | 4 | 4 | 2 | 1 | 4 | 3 | 2 | 1 | 4 | 3 | 2 | 1 | X |
7 | 4 | 3 | 3 | 3 | 3 | 2 | 2 | 2 | 4 | 4 | 3 | 3 | 2 | 2 | 1 | 1 | X |
8 | 3 | 2 | 2 | 3 | 4 | 4 | 3 | 3 | 2 | 2 | 3 | 3 | 1 | 1 | 1 | 1 | X |
9 | 4 | 3 | 2 | 1 | 4 | 3 | 2 | 1 | 4 | 3 | 2 | 1 | 4 | 3 | 2 | 1 | X |
Condition | Reward |
---|---|
+0.1 | |
−0.2 | |
+0.2 | |
−0.4 | |
+0.3 | |
−0.6 |
Parameter | Value |
---|---|
Learning rate | 0.0001 |
Gamma | 0.9 |
Lambda | 3 |
Clipping | 0.1 |
Number of Nodes | 128, 64, 32 |
Optimizer | Adam |
Rehandling Occurs: O, Rehandling Does Not Occur: X | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Case | Container List (POD NO) | Rehandling Status | ||||||||||||||||
1 | POD | 3 | 2 | 2 | 1 | 3 | 2 | 2 | 1 | 3 | 2 | 2 | 1 | 3 | 3 | 2 | 1 | X |
Weight | 15 | 14 | 14 | 13 | 15 | 14 | 14 | 13 | 15 | 14 | 14 | 13 | 15 | 15 | 14 | 13 | ||
2 | POD | 3 | 3 | 3 | 3 | 3 | 3 | 2 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | 1 | X |
Weight | 15 | 15 | 15 | 15 | 15 | 15 | 14 | 14 | 14 | 14 | 14 | 13 | 13 | 13 | 13 | 13 | ||
3 | POD | 4 | 3 | 2 | 2 | 4 | 4 | 3 | 3 | 4 | 3 | 2 | 1 | 4 | 3 | 2 | 1 | X |
Weight | 16 | 15 | 14 | 14 | 16 | 16 | 15 | 15 | 16 | 15 | 14 | 13 | 16 | 15 | 14 | 13 | ||
4 | POD | 4 | 3 | 3 | 1 | 4 | 3 | 2 | 1 | 3 | 2 | 2 | 1 | 3 | 3 | 3 | 2 | X |
Weight | 16 | 15 | 15 | 13 | 16 | 15 | 14 | 13 | 15 | 14 | 14 | 13 | 15 | 15 | 15 | 14 | ||
5 | POD | 4 | 4 | 4 | 4 | 4 | 4 | 3 | 3 | 2 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | X |
Weight | 16 | 16 | 16 | 16 | 16 | 16 | 15 | 15 | 14 | 14 | 13 | 13 | 13 | 13 | 13 | 13 | ||
6 | POD | 4 | 4 | 2 | 1 | 4 | 4 | 2 | 1 | 4 | 3 | 2 | 1 | 4 | 3 | 2 | 1 | X |
Weight | 16 | 16 | 14 | 13 | 16 | 16 | 14 | 13 | 16 | 15 | 14 | 13 | 16 | 15 | 14 | 13 | ||
7 | POD | 4 | 3 | 3 | 3 | 3 | 2 | 2 | 2 | 4 | 4 | 3 | 3 | 2 | 2 | 1 | 1 | X |
Weight | 16 | 15 | 15 | 15 | 15 | 14 | 14 | 14 | 16 | 16 | 15 | 15 | 14 | 14 | 13 | 13 | ||
8 | POD | 3 | 2 | 2 | 3 | 4 | 4 | 3 | 3 | 2 | 2 | 3 | 3 | 1 | 1 | 1 | 1 | X |
Weight | 15 | 14 | 14 | 15 | 16 | 16 | 15 | 15 | 14 | 14 | 15 | 15 | 13 | 13 | 13 | 13 | ||
9 | POD | 4 | 3 | 2 | 1 | 4 | 3 | 2 | 1 | 4 | 3 | 2 | 1 | 4 | 3 | 2 | 1 | X |
Weight | 16 | 15 | 14 | 13 | 16 | 15 | 14 | 13 | 16 | 15 | 14 | 13 | 16 | 15 | 14 | 13 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cho, J.; Ku, N. Developing a Container Ship Loading-Planning Program Using Reinforcement Learning. J. Mar. Sci. Eng. 2024, 12, 1832. https://doi.org/10.3390/jmse12101832
Cho J, Ku N. Developing a Container Ship Loading-Planning Program Using Reinforcement Learning. Journal of Marine Science and Engineering. 2024; 12(10):1832. https://doi.org/10.3390/jmse12101832
Chicago/Turabian StyleCho, JaeHyeok, and NamKug Ku. 2024. "Developing a Container Ship Loading-Planning Program Using Reinforcement Learning" Journal of Marine Science and Engineering 12, no. 10: 1832. https://doi.org/10.3390/jmse12101832
APA StyleCho, J., & Ku, N. (2024). Developing a Container Ship Loading-Planning Program Using Reinforcement Learning. Journal of Marine Science and Engineering, 12(10), 1832. https://doi.org/10.3390/jmse12101832