Adaptive Supply Chain: Demand–Supply Synchronization Using Deep Reinforcement Learning
Abstract
:1. Introduction
2. Related Work and Novelty
3. Methodology
3.1. Supply Chain Environment
- All the stages except the raw material pool place replenishment orders.
- Replenishment orders are satisfied according to available production and inventory capacity.
- Demand arises and is satisfied according to the inventory on hand at stage 0 (retailer).
- Unsatisfied demand and replenishment orders are lost (backlogging is not allowed).
- Holding costs are charged for each unit of excess inventory.
3.2. Proximal Policy Optimization Algorithm
Algorithm 1 Actor–Critic Style within PPO |
for iteration = 1, 2, ... do for actor = 1, 2, ..., N do Run policy πθold in environment for T timesteps Compute advantage estimates , ..., end for Optimize surrogate Loss with regard to θ θold ← θ end for |
4. Numerical Experiment
4.1. Implementation and Configurations
4.2. Results
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Goodfellow, I.; Bengio, Y.; Courville, A.; Bengio, Y. Deep Learning, 2nd ed.; MIT Press: Cambridge, UK, 2016. [Google Scholar]
- Mosavi, A.; Faghan, Y.; Ghamisi, P.; Duan, P.; Ardabili, S.F.; Salwana, E.; Band, S.S. Comprehensive review of deep reinforcement learning methods and applications in economics. Mathematics 2020, 8, 1640. [Google Scholar] [CrossRef]
- Ng, A.; Coates, A.; Diel, M.; Ganapathi, V.; Schulte, J.; Tse, B.; Berger, E.; Liang, E. Autonomous inverted helicopter flight via reinforcement learning. Exp. Robot. 2006, 9, 363–372. [Google Scholar] [CrossRef]
- Fridman, L.; Terwilliger, J.; Jenik, B. Deep traffic: Crowdsourced hyperparameter tuning of deep reinforcement learning systems for multi-agent dense traffic navigation. arXiv 2018, arXiv:1801.02805. [Google Scholar]
- Isele, D.; Rahimi, R.; Cosgun, A.; Subramanian, K.; Fujimura, K. Navigating occluded intersections with autonomous vehicles using deep reinforcement learning. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 2034–2039. [Google Scholar]
- Gu, S.; Holly, E.; Lillicrap, T.; Levine, S. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA 2017), Singapore, 29 May–3 June 2017; pp. 3389–3396. [Google Scholar]
- Silver, D.; Hubert, T.; Schrittwieser, J.; Antonoglou, I.; Lai, M.; Guez, A.; Lanctot, M.; Sifre, L.; Kumaran, D.; Graepel, T.; et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 2018, 362, 1140–1144. [Google Scholar] [CrossRef] [Green Version]
- Berner, C.; Brockman, G.; Chan, B.; Cheung, V.; Debiak, P.; Dennison, C.; Farhi, D.; Fischer, Q.; Hashme, S.; Hesse, C.; et al. Dota 2 with large scale deep reinforcement learning. arXiv 2019, arXiv:1912.06680. [Google Scholar]
- Vinyals, O.; Babuschkin, I.; Czarnecki, W.M.; Mathieu, M.; Dudzik, A.; Chung, J.; Choi, D.H.; Powell, R.; Ewalds, T.; Georgiev, P.; et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 2019, 575, 350–354. [Google Scholar] [CrossRef]
- Guss, W.H.; Codel, C.; Hofmann, K.; Houghton, B.; Kuno, N.; Milani, S.; Mohanty, S.P.; Liebana, D.P.; Salakhutdinov, R.; Topin, N.; et al. The MineRL Competition on Sample Efficient Reinforcement Learning using Human Priors. arXiv 2019, arXiv:1904.10079. [Google Scholar]
- Perez, H.D.; Hubbs, C.D.; Li, C.; Grossmann, I.E. Algorithmic Approaches to Inventory Management Optimization. Processes 2021, 9, 102. [Google Scholar] [CrossRef]
- Lee, H.L.; Padmanabhan, V.; Whang, S. Information distortion in a supply chain: The bullwhip effect. Manag. Sci. 1997, 43, 546–558. [Google Scholar] [CrossRef]
- Li, Y.; Chen, K.; Collignon, S.; Ivanov, D. Ripple effect in the supply chain network: Forward and backward disruption propagation, network health and firm vulnerability. Eur. J. Oper. Res. 2021, 291, 1117–1131. [Google Scholar] [CrossRef] [PubMed]
- Ivanov, D.; Dolgui, A. A digital supply chain twin for managing the disruption risks and resilience in the era of Industry 4.0. Prod. Plan. Control 2020, 32, 775–788. [Google Scholar] [CrossRef]
- Mortazavi, A.; Arshadi Khamseh, A.; Azimi, P. Designing of an intelligent self-adaptive model for supply chain ordering management system. Eng. Appl. Artif. Intell. 2015, 37, 207–220. [Google Scholar] [CrossRef]
- Barat, S.; Khadilkar, H.; Meisheri, H.; Kulkarni, V.; Baniwal, V.; Kumar, P.; Gajrani, M. Actor based simulation for closed loop control of supply chain using reinforcement learning. In Proceedings of the 18th International Conference on Autonomous Agents and Multi Agent Systems 2019, Montreal, QC, Canada, 13–17 May 2019; pp. 1802–1804. [Google Scholar]
- Zhao, Y.; Hemberg, E.; Derbinsky, N.; Mata, G.; O’Reilly, U.M. Simulating a Logistics Enterprise Using an Asymmetrical Wargame Simulation with Soar Reinforcement Learning and Coevolutionary Algorithms. In Proceedings of the Genetic and Evolutionary Computation Conference Companion (GECCO’ 21), Lille, France, 10–14 July 2021; pp. 1907–1915. [Google Scholar] [CrossRef]
- Wang, R.; Gan, X.; Li, Q.; Yan, X. Solving a Joint Pricing and Inventory Control Problem for Perishables via Deep Reinforcement Learning. Complexity 2021, 2021, 6643131. [Google Scholar] [CrossRef]
- Oroojlooyjadid, A.; Nazari, M.; Snyder, L.V.; Takac, M. A Deep Q-Network for the Beer Game: Deep Reinforcement Learning for Inventory Optimization. Manuf. Serv. Oper. Manag. 2021, 1–20. [Google Scholar] [CrossRef]
- Hubbs, C.; Perez, H.D.; Sarwar, O.; Sahinidis, N.V.; Grossmann, I.E.; Wassick, J.M. OR-Gym: A Reinforcement Learning Library for Operations Research Problem. arXiv 2020, arXiv:2008.06319. [Google Scholar]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Bellman, R. A Markovian decision process. J. Math. Mech. 1957, 6, 679–684. [Google Scholar] [CrossRef]
- Glasserman, P.; Tayur, S. Sensitivity analysis for base-stock levels in multiechelon production-inventory systems. Manag. Sci. 1995, 41, 263–281. [Google Scholar] [CrossRef]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
- Moritz, P.; Nishihara, R.; Wang, S.; Tumanov, A.; Liaw, R.; Liang, E.; Elibol, M.; Yang, Z.; Paul, W.; Jordan, M.I.; et al. Ray: A distributed framework for emerging AI applications. In Proceedings of the 13th Symposium on Operating Systems Design and Implementation (OSDI 18), Carlsbad, CA, USA, 8–10 October 2018; pp. 561–577. [Google Scholar]
- Kapuscinski, R.; Tayur, S. Optimal policies and simulation-based optimization for capacitated production inventory systems. In Quantitative Models for Supply Chain Management; Springer: Berlin/Heidelberg, Germany, 1999; pp. 7–40. [Google Scholar] [CrossRef]
- Executable Colab Notebook: Google Colaboratory. Available online: https://colab.research.google.com/drive/1D_E1j10skbohOOA4vbuy9OCw4dqbAW_S?usp=sharing (accessed on 14 August 2021).
- Jackson, I.; Tolujevs, J.; Kegenbekov, Z. Review of Inventory Control Models: A Classification Based on Methods of Obtaining Optimal Control Parameters. Transp. Telecommun. 2020, 21, 191–202. [Google Scholar] [CrossRef]
- Brockman, G.; Cheung, V.; Pettersson, L.; Schneider, J.; Schulman, J.; Tang, J.; Zaremba, W. Openai gym. arXiv 2016, arXiv:1606.01540. [Google Scholar]
- Schuderer, A.; Bromuri, S.; van Eekelen, M. Sim-Env: Decoupling OpenAI Gym Environments from Simulation Models. arXiv 2021, arXiv:2102.09824. [Google Scholar]
- Saifutdinov, F.; Jackson, I.; Tolujevs, J.; Zmanovska, T. Digital Twin as a Decision Support Tool for Airport Traffic Control. In Proceedings of the 61st International Scientific Conference on Information Technology and Management Science of Riga Technical University (ITMS), Riga, Latvia, 15–16 October 2020; pp. 1–5. [Google Scholar] [CrossRef]
- Shao, K.; Tang, Z.; Zhu, Y.; Li, N.; Zhao, D. A survey of deep reinforcement learning in video games. arXiv 2019, arXiv:1912.10944. [Google Scholar]
- Hessel, M.; Soyer, H.; Espeholt, L.; Czarnecki, W.; Schmitt, S.; van Hasselt, H. Multi-task deep reinforcement learning with popart. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; pp. 3796–3803. [Google Scholar] [CrossRef]
- Teh, Y.W.; Bapst, V.; Czarnecki, W.; Quan, J.; Kirkpatrick, J.; Hadsell, R.; Heess, N.; Pascanu, R. Distral: Robust multitask reinforcement learning. arXiv 2017, arXiv:1707.04175. [Google Scholar]
Parameter | Stage 0 | Stage 1 | Stage 2 | Stage 3 |
---|---|---|---|---|
D | 20 | - | - | - |
I0 | 100 | 100 | 200 | ∞ |
ρ | 2 | 1.5 | 1 | 0.5 |
r | 1.5 | 1.0 | 0.75 | 0.5 |
k | 0.1 | 0.075 | 0.05 | 0.025 |
h | 0.15 | 0.10 | 0.5 | - |
c | - | 100 | 90 | 80 |
L | 3 | 5 | 10 | - |
Parameter | Stage 0 | Stage 1 | Stage 2 | Stage 3 | Stage 4 |
---|---|---|---|---|---|
D | 35 | - | - | - | - |
I0 | 100 | 120 | 160 | 200 | ∞ |
ρ | 3.5 | 2.5 | 1.5 | 1 | 0.5 |
r | 1.0 | 1.0 | 0.75 | 0.5 | 0.5 |
k | 0.10 | 0.10 | 0.05 | 0.05 | 0.025 |
h | 0.15 | 0.10 | 0.05 | 0.05 | - |
c | - | 150 | 150 | 150 | 150 |
L | 3 | 5 | 5 | 5 | - |
Parameter | Stage 0 | Stage 1 | Stage 2 | Stage 3 | Stage 4 |
---|---|---|---|---|---|
D | 20 | - | - | - | - |
I0 | 100 | 120 | 150 | 150 | ∞ |
ρ | 4.1 | 2 | 1.5 | 1 | 0.5 |
r | 1.0 | 1.0 | 0.75 | 0.5 | 0.5 |
k | 0.10 | 0.075 | 0.05 | 0.025 | 0.025 |
h | 0.10 | 0.10 | 0.05 | 0.05 | - |
c | - | 120 | 100 | 100 | 90 |
L | 3 | 5 | 7 | 7 | - |
Parameter | Specification |
---|---|
GPU Model Name | Nvidia K80 |
GPU Memory | 12 GB |
GPU Memory Clock | 0.82 GHz |
GPU Performance | 4.1 TFLOPS |
CPU Model Name | Intel(R) Xeon(R) |
CPU Frequency | 2.30 GHz |
Number of CPU Cores | 2 |
Available RAM | 12 GB |
Disk Space | 25 GB |
Experiment | Mean | Std. | Coefficient of Variation | Confidence Interval (95%) |
---|---|---|---|---|
Experiment 1 (PPO) | 425.6 | 19.4 | 0.046 | 425.6 ± 3.12 |
Experiment 1 (base-stock) | 414.3 | 26.5 | 0.064 | 414.3 ± 4.75 |
Experiment 2 (PPO) | 330.9 | 14.6 | 0.044 | 330.9 ± 2.32 |
Experiment 2 (base-stock) | −40.1 | 9.8 | −0.241 | −40.15 ± 1.56 |
Experiment 3 (PPO) | 328.7 | 93.1 | 0.283 | 328.7 ± 16.5 |
Experiment 3 (base-stock) | −43.4 | 7.6 | −0.175 | −43.4 ± 1.4 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kegenbekov, Z.; Jackson, I. Adaptive Supply Chain: Demand–Supply Synchronization Using Deep Reinforcement Learning. Algorithms 2021, 14, 240. https://doi.org/10.3390/a14080240
Kegenbekov Z, Jackson I. Adaptive Supply Chain: Demand–Supply Synchronization Using Deep Reinforcement Learning. Algorithms. 2021; 14(8):240. https://doi.org/10.3390/a14080240
Chicago/Turabian StyleKegenbekov, Zhandos, and Ilya Jackson. 2021. "Adaptive Supply Chain: Demand–Supply Synchronization Using Deep Reinforcement Learning" Algorithms 14, no. 8: 240. https://doi.org/10.3390/a14080240
APA StyleKegenbekov, Z., & Jackson, I. (2021). Adaptive Supply Chain: Demand–Supply Synchronization Using Deep Reinforcement Learning. Algorithms, 14(8), 240. https://doi.org/10.3390/a14080240