Leveraging Multi-Agent Reinforcement Learning for Digital Transformation in Supply Chain Inventory Optimization
Abstract
:1. Introduction
- Our proposed method only requires sharing limited information during the training stage, overcoming the limitations of existing approaches that require full information visibility across SC.
- The proposed method incorporates SC network topology to optimize agent training by classifying SC firms into distinct groups and leveraging their relationship to enhance the training process.
- Simulation-based evaluations are used to demonstrate the superior performance of PMaRL compared to current optimization methods on practical scenarios of real-world SC networks.
2. Literature Review
2.1. Information Sharing
2.2. Inventory Optimization Approach
2.3. Reinforcement Learning
3. Research Objective
- Cooperation with minor competition;
- Incomplete information;
- Prisoners’ dilemmas.
4. Methods
4.1. Fundamentals
4.2. Policy Networks
4.3. Critic Networks
5. Experiments and Results
5.1. Simulation Setup
5.2. SC Inventory Management Settings
5.3. Results—Comparison with Traditional Methods
5.4. Results—Comparison Among Different Multi-Agent-Based RL Methods
5.5. Results—Training Time and Convergence
5.6. Discussion
6. Conclusions and Future Works
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Panahifar, F.; Byrne, P.J.; Salam, M.A.; Heavey, C. Supply chain collaboration and firm’s performance: The critical role of information sharing and trust. J. Enterp. Inf. Manag. 2018, 31, 358–379. [Google Scholar] [CrossRef]
- Zhang, B.; Tan, W.J.; Cai, W.; Zhang, A.N. Forecasting with Visibility Using Privacy Preserving Federated Learning. In Proceedings of the 2022 Winter Simulation Conference (WSC), Singapore, 11–14 December 2022; pp. 2687–2698. [Google Scholar]
- Sánchez-Flores, R.B.; Cruz-Sotelo, S.E.; Ojeda-Benitez, S.; Ramírez-Barreto, M.E. Sustainable supply chain management—A literature review on emerging economies. Sustainability 2020, 12, 6972. [Google Scholar] [CrossRef]
- Zhang, A.N.; Goh, M.; Meng, F. Conceptual modelling for supply chain inventory visibility. Int. J. Prod. Econ. 2011, 133, 578–585. [Google Scholar] [CrossRef]
- Zavala-Alcívar, A.; Verdecho, M.J.; Alfaro-Saiz, J.J. A conceptual framework to manage resilience and increase sustainability in the supply chain. Sustainability 2020, 12, 6300. [Google Scholar] [CrossRef]
- Nikolopoulos, K.; Punia, S.; Schäfers, A.; Tsinopoulos, C.; Vasilakis, C. Forecasting and Planning During a Pandemic: COVID-19 Growth Rates, Supply Chain Disruptions, and Governmental Decisions. Eur. J. Oper. Res. 2021, 290, 99–115. [Google Scholar] [CrossRef]
- Rolf, B.; Jackson, I.; Müller, M.; Lang, S.; Reggelin, T.; Ivanov, D. A review on reinforcement learning algorithms and applications in supply chain management. Int. J. Prod. Res. 2022, 61, 7151–7179. [Google Scholar] [CrossRef]
- Jackson, I.; Ivanov, D.; Dolgui, A.; Namdar, J. Generative artificial intelligence in supply chain and operations management: A capability-based framework for analysis and implementation. Int. J. Prod. Res. 2024, 62, 6120–6145. [Google Scholar] [CrossRef]
- Lazar, S.; Klimecka-Tatar, D.; Obrecht, M. Sustainability orientation and focus in logistics and supply chains. Sustainability 2021, 13, 3280. [Google Scholar] [CrossRef]
- Ramanathan, U. Performance of supply chain collaboration–A simulation study. Expert Syst. Appl. 2014, 41, 210–220. [Google Scholar] [CrossRef]
- Chen, Y.; Özer, Ö. Supply Chain Contracts that Prevent Information Leakage. Manag. Sci. 2019, 65, 5619–5650. [Google Scholar] [CrossRef]
- Kumar, A.; Shrivastav, S.K.; Shrivastava, A.K.; Panigrahi, R.R.; Mardani, A.; Cavallaro, F. Sustainable supply chain management, performance measurement, and management: A review. Sustainability 2023, 15, 5290. [Google Scholar] [CrossRef]
- Barat, S.; Khadilkar, H.; Meisheri, H.; Kulkarni, V.; Baniwal, V.; Kumar, P.; Gajrani, M. Actor based simulation for closed loop control of supply chain using reinforcement learning. In Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems, Montreal, QC, Canada, 13–17 May 2019; pp. 1802–1804. [Google Scholar]
- Iqbal, S.; Sha, F. Actor-attention-critic for multi-agent reinforcement learning. In Proceedings of the International Conference on Machine Learning. PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 2961–2970. [Google Scholar]
- Chen, L.; Dong, T.; Peng, J.; Ralescu, D. Uncertainty analysis and optimization modeling with application to supply chain management: A systematic review. Mathematics 2023, 11, 2530. [Google Scholar] [CrossRef]
- Agrawal, T.K.; Kalaiarasan, R.; Olhager, J.; Wiktorsson, M. Supply chain visibility: A Delphi study on managerial perspectives and priorities. Int. J. Prod. Res. 2024, 62, 2927–2942. [Google Scholar] [CrossRef]
- Barlas, Y.; Gunduz, B. Demand Forecasting and Sharing Strategies to Reduce Fluctuations and the Bullwhip Effect in Supply Chains. J. Oper. Res. Soc. 2011, 62, 458–473. [Google Scholar] [CrossRef]
- Somapa, S.; Cools, M.; Dullaert, W. Characterizing Supply Chain Visibility—A Literature Review. Int. J. Logist. Manag. 2018, 29, 308–339. [Google Scholar] [CrossRef]
- Yang, D.; Zhang, A.N. Impact of information sharing and forecast combination on fast-moving-consumer-goods demand forecast accuracy. Information 2019, 10, 260. [Google Scholar] [CrossRef]
- Feizabadi, J. Machine Learning Demand Forecasting and Supply Chain Performance. Int. J. Logist. Res. Appl. 2022, 25, 119–142. [Google Scholar] [CrossRef]
- Ried, L.; Eckerd, S.; Kaufmann, L.; Carter, C. Spillover Effects of Information Leakages in Buyer-Supplier-Supplier Triads. J. Oper. Manag. 2021, 67, 280–306. [Google Scholar] [CrossRef]
- Tan, K.H.; Wong, W.P.; Chung, L. Information and Knowledge Leakage in Supply Chain. Inf. Syst. Front. 2016, 18, 621–638. [Google Scholar] [CrossRef]
- Saha, E.; Ray, P.K. Modelling and analysis of inventory management systems in healthcare: A review and reflections. Comput. Ind. Eng. 2019, 137, 106051. [Google Scholar] [CrossRef]
- Fokouop, R.; Sahin, E.; Jemai, Z.; Dallery, Y. A heuristic approach for multi-echelon inventory optimisation in a closed-loop supply chain. Int. J. Prod. Res. 2024, 62, 3435–3459. [Google Scholar] [CrossRef]
- Rong, Y.; Atan, Z.; Snyder, L.V. Heuristics for base-stock levels in multi-echelon distribution networks. Prod. Oper. Manag. 2017, 26, 1760–1777. [Google Scholar] [CrossRef]
- Willems, S.P. Data set—Real-world multiechelon supply chains used for inventory optimization. Manuf. Serv. Oper. Manag. 2008, 10, 19–23. [Google Scholar] [CrossRef]
- Shang, K.H.; Song, J.S. Newsvendor bounds and heuristic for optimal policies in serial supply chains. Manag. Sci. 2003, 49, 618–638. [Google Scholar] [CrossRef]
- Lesnaia, E. Optimizing Safety Stock Placement in General Network Supply Chains. Ph.D. Thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 2004. [Google Scholar]
- Ahmadi, E.; Masel, D.T.; Hostetler, S. A robust stochastic decision-making model for inventory allocation of surgical supplies to reduce logistics costs in hospitals: A case study. Oper. Res. Health Care 2019, 20, 33–44. [Google Scholar] [CrossRef]
- Kim, B.; Kim, J.G.; Lee, S. A multi-agent reinforcement learning model for inventory transshipments under supply chain disruption. IISE Trans. 2024, 56, 715–728. [Google Scholar] [CrossRef]
- Oroojlooyjadid, A.; Nazari, M.; Snyder, L.V.; Takáč, M. A deep q-network for the beer game: Deep reinforcement learning for inventory optimization. Manuf. Serv. Oper. Manag. 2022, 24, 285–304. [Google Scholar] [CrossRef]
- Fuji, T.; Ito, K.; Matsumoto, K.; Yano, K. Deep multi-agent reinforcement learning using dnn-weight evolution to optimize supply chain performance. Hawaii Int. Conf. Syst. Sci. 2018. [Google Scholar] [CrossRef]
- Dehaybe, H.; Catanzaro, D.; Chevalier, P. Deep Reinforcement Learning for inventory optimization with non-stationary uncertain demand. Eur. J. Oper. Res. 2024, 314, 433–445. [Google Scholar] [CrossRef]
- Kotecha, N.; Chanona, A.d.R. Leveraging Graph Neural Networks and Multi-Agent Reinforcement Learning for Inventory Control in Supply Chains. arXiv 2024, arXiv:2410.18631. [Google Scholar]
- Nurkasanah, I. Reinforcement learning approach for efficient inventory policy in multi-echelon supply chain under various assumptions and constraints. J. Inf. Syst. Eng. Bus. Intell. 2021, 7, 138–148. [Google Scholar] [CrossRef]
- Sutton, R.S.; McAllester, D.; Singh, S.; Mansour, Y. Policy Gradient Methods for Reinforcement Learning with Function Approximation. Advances in Neural Information Processing Systems 12 (NIPS 1999). Available online: https://papers.nips.cc/paper_files/paper/1999/hash/464d828b85b0bed98e80ade0a5c43b0f-Abstract.html (accessed on 30 September 2024).
- Konda, V.; Tsitsiklis, J. Actor-Critic Algorithms. Advances in Neural Information Processing Systems 12 (NIPS 1999). Available online: https://papers.nips.cc/paper_files/paper/1999/hash/6449f44a102fde848669bdd9eb6b76fa-Abstract.html (accessed on 30 September 2024).
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the International Conference on Machine Learning. PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 1861–1870. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. Advances in Neural Information Processing Systems 30 (NIPS 2017). Available online: https://papers.nips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html (accessed on 30 September 2024).
- Chen, F.; Samroengraja, R. The stationary beer game. Prod. Oper. Manag. 2000, 9, 19–30. [Google Scholar] [CrossRef]
- Cheng, C.Y.; Chen, T.L.; Chen, Y.Y. An analysis of the structural complexity of supply chain networks. Appl. Math. Model. 2014, 38, 2328–2344. [Google Scholar] [CrossRef]
- Zhao, Y. Evaluation and optimization of installation base-stock policies in supply chains with compound Poisson demand. Oper. Res. 2008, 56, 437–452. [Google Scholar] [CrossRef]
- Graves, S.C.; Willems, S.P. Optimizing strategic safety stock placement in supply chains. Manuf. Serv. Oper. Manag. 2000, 2, 68–83. [Google Scholar] [CrossRef]
- Goldberg, D.A.; Reiman, M.I.; Wang, Q. A survey of recent progress in the asymptotic analysis of inventory systems. Prod. Oper. Manag. 2021, 30, 1718–1750. [Google Scholar] [CrossRef]
- Gronauer, S.; Diepold, K. Multi-agent deep reinforcement learning: A survey. Artif. Intell. Rev. 2022, 55, 895–943. [Google Scholar] [CrossRef]
- Ivanov, D. Exiting the COVID-19 pandemic: After-shock risks and avoidance of disruption tails in supply chains. Ann. Oper. Res. 2024, 335, 1627–1644. [Google Scholar] [CrossRef]
SC Network | No. Stages | No. Nodes | No. Edges | Network Complexity |
---|---|---|---|---|
Beer Game (BG) | 4 | 4 | 3 | 1.5 |
Real Network 1 (R1) | 5 | 17 | 18 | 2.82 |
Real Network 2 (R2) | 4 | 22 | 39 | 1.91 |
Real Network 3 (R3) | 5 | 27 | 31 | 2.27 |
SC Network | Retailers | Retailer Demand | Action Space |
---|---|---|---|
BG | 1 | {−8, 24} | |
R1 | 4 | {−4, 10} | |
R2 | 9 | {−3, 6} | |
R3 | 8 | {−3, 6} |
Method | Critic Network | Policy Network |
---|---|---|
MAAC | Without SC network topology | Local Information |
PMaRL | With SC network topology | Local Information |
FV | With SC network topology | Global Information |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, B.; Tan, W.J.; Cai, W.; Zhang, A.N. Leveraging Multi-Agent Reinforcement Learning for Digital Transformation in Supply Chain Inventory Optimization. Sustainability 2024, 16, 9996. https://doi.org/10.3390/su16229996
Zhang B, Tan WJ, Cai W, Zhang AN. Leveraging Multi-Agent Reinforcement Learning for Digital Transformation in Supply Chain Inventory Optimization. Sustainability. 2024; 16(22):9996. https://doi.org/10.3390/su16229996
Chicago/Turabian StyleZhang, Bo, Wen Jun Tan, Wentong Cai, and Allan N. Zhang. 2024. "Leveraging Multi-Agent Reinforcement Learning for Digital Transformation in Supply Chain Inventory Optimization" Sustainability 16, no. 22: 9996. https://doi.org/10.3390/su16229996
APA StyleZhang, B., Tan, W. J., Cai, W., & Zhang, A. N. (2024). Leveraging Multi-Agent Reinforcement Learning for Digital Transformation in Supply Chain Inventory Optimization. Sustainability, 16(22), 9996. https://doi.org/10.3390/su16229996