Online Planning for Autonomous Mobile Robots with Different Objectives in Warehouse Commissioning Task
Abstract
:1. Introduction
- We propose an environment in which each agent picks up the assigned items under constraints prohibiting collision during agent movement. We formulate the environment as a multi-agent MDP and discuss how the next state is determined from the current state and the actions of all agents;
- We propose a behavioral-planning method to transport items efficiently and a mechanism to exchange allocated items between agents. We also present an action-planning method based on MCTS;
- To improve the achievement rate, we propose a negotiation protocol and exchange negotiation strategies to facilitate the exchange of allocated items between agents;
- The results of the simulation test comprehensively demonstrate the achievement rate of the item-exchange strategies under various test settings.
2. Related Work
3. Problem
3.1. Spatial Task-Allocation Problem and Markov Decision Process
- There are N agents;
- There is a graph , and the agent is located at one of the nodes in the graph;
- One node referred to as the depot (stop) is fixed;
- The agent has a fixed value capacity c;
- If the agent has c tasks running, it cannot run new tasks until it is reset in the depot;
- There are four types of agent actions, i.e., do nothing, move to an adjacent node, execute a task at the current node, and reset;
- All agents act simultaneously for each step;
- Node v has a fixed probability , and a task appears at each step with probability ;
- The state changes every step, and the agent receives rewards according to its actions;
- The agent has full state observability.
- S: Set of agent states;
- A: Set of possible actions;
- : Transition function;
- : Reward function;
- D: Agent set;
- : Graph;
- I: The collection of agent states;
- T: The set of node states;
- A: The set of possible actions;
- : Transition function;
- : Reward function;
3.2. Target Problem
- Stay in place;
- Move to an adjacent node;
- Collect the item;
- Drop the item at the depot.
- D: Agent set;
- G: Graph;
- I: The set of possible states of the agent;
- T: The set of possible states of a node;
- A: The set of possible actions for the agent;
- : Transition function;
- : Reward function;
No-Collision Constraints
Algorithm 1 NextPos |
|
Algorithm 2 Check |
|
4. Fully Decoupled UCT
4.1. Monte Carlo Tree Search
- Selection: begin from the root node R and select actions according to an appropriate algorithm until the unexpanded node L is reached;
- Expansion: Expand if the number of visits to node L is greater than or equal to the threshold. If the node is visited at the next time, it is for searched more deeply;
- Simulation (Rollout, Playout): begin from L and select actions according to an appropriate algorithm until the end of the game (or a certain depth), and then calculate the obtained cumulative reward;
- Backpropagation: propagate the reward from L to R and update the average reward obtained when each action is selected at each node.
Algorithm 3 MCTS |
|
4.2. Upper Confidence Tree
4.3. Agent Behavior Prediction
Algorithm 4 Iterative Greedy |
|
4.4. Decoupled UCT
4.5. Fully Decoupled UCT
Algorithm 5 Fully Decoupled UCT |
|
5. Item-Exchange Protocol and Agent Strategies
5.1. Item-Exchange Protocol
- Each agent selects at most one item from among the items assigned to them that they would like to have transported on their behalf and submits a request;
- Each agent selects and responds to at most one request that it is willing to accept;
- The requesting agent selects a single agent willing to accept the request. After that, the requesting agent sends the acceptance message to the agent.
5.2. Exchange Strategy
5.2.1. Request Strategy
- Select the item closest to the depot (NEAREST_FROM_DEPOT);
- Select the item most distant from the depot (FARTHEST_FROM_DEPOT);
- Select a item at random (RANDOM).
5.2.2. Acceptance Strategy
- Select the request closest to the depot (NEAREST_FROM_DEPOT);
- Select the request most distant from the depot (FARTHEST_FROM_DEPOT);
- Select a request at random (RANDOM).
5.2.3. Nomination Strategy
6. Multi-Agent Warehouse-Commissioning Simulator
6.1. Input
- numAgents: Number of agents;
- lastTurn: Number of steps to perform the simulation;
- newItemProb: Probability that a new item will appear;
- numIters: Number of iterations in the FDUCT algorithm;
- maxDepth: Maximum depth to search in the FDUCT algorithm;
- expandThresh: Expansion threshold in the FDUCT algorithm;
- reward: The reward is obtained when the agent collects and unloads the item at the depot;
- penalty: The penalty (i.e., a negative reward) is given to an agent when it collides with another agent;
- discountFactor: Discount rate in cumulative reward calculation;
- randSeed: Random number seed value;
- enableExchange: Whether to exchange luggage;
- requestStrategy: Request strategy;
- acceptStrategy: Acceptance strategy;
- nominateStrategy: Nomination strategy;
6.2. Output
- The number of items that appeared;
- The number of items that could be carried;
- The achievement rate for each agent.
7. Evaluation
7.1. Common Settings
7.2. Grid Graphs for Warehouses
7.3. Evaluation Result
7.3.1. Warehouse-Small
7.3.2. Warehouse-Medium
7.3.3. Warehouse-Large
8. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Mukhopadhyay, A.; Pettet, G.; Samal, C.; Dubey, A.; Vorobeychik, Y. An Online Decision-Theoretic Pipeline for Responder Dispatch. In Proceedings of the 10th ACM/IEEE International Conference on Cyber-Physical Systems, New York, NY, USA, 16–18 April 2019; pp. 185–196. [Google Scholar] [CrossRef]
- Pettet, G.; Mukhopadhyay, A.; Kochenderfer, M.J.; Dubey, A. Hierarchical Planning for Dynamic Resource Allocation in Smart and Connected Communities. ACM Trans. Cyber Phys. Syst. 2022, 6, 1–26. [Google Scholar] [CrossRef]
- Wurman, P.; D’Andrea, R.; Mountz, M. Coordinating Hundreds of Cooperative, Autonomous Vehicles in Warehouses. AI Mag. 2008, 29, 9–20. [Google Scholar]
- Henn, S.; Koch, S.; Wäscher, G. Order Batching in Order Picking Warehouses: A Survey of Solution Approaches. In Warehousing in the Global Supply Chain: Advanced Models, Tools and Applications for Storage Systems; Manzini, R., Ed.; Springer: London, UK, 2012; pp. 105–137. [Google Scholar]
- Claes, D.; Oliehoek, F.; Baier, H.; Tuyls, K. Decentralised Online Planning for Multi-Robot Warehouse Commissioning. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, Richland, SC, USA, 8–12 May 2017; pp. 492–500. [Google Scholar]
- Claes, D.; Robbel, P.; Oliehoek, F.A.; Tuyls, K.; Hennes, D.; van der Hoek, W. Effective Approximations for Multi-Robot Coordination in Spatially Distributed Tasks. In Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, Istanbul, Turkey, 4–8 May 2015; pp. 881–890. [Google Scholar]
- Kocsis, L.; Szepesvári, C. Bandit Based Monte-Carlo Planning. In Lecture Notes in Computer Science, Proceedings of the Machine Learning, ECML 2006, 17th European Conference on Machine Learning, Berlin, Germany, 18–22 September 2006; Proceedings; Fürnkranz, J., Scheffer, T., Spiliopoulou, M., Eds.; Springer: Berlin/Heidelberg, Germany, 2006; Volume 4212, pp. 282–293. [Google Scholar]
- Braquet, M.; Bakolas, E. Greedy Decentralized Auction-based Task Allocation for Multi-Agent Systems. IFAC-PapersOnLine 2021, 54, 675–680. [Google Scholar] [CrossRef]
- Schneider, E.; Sklar, E.I.; Parsons, S.; Özgelen, A.T. Auction-Based Task Allocation for Multi-robot Teams in Dynamic Environments. In Proceedings of the Towards Autonomous Robotic Systems; Dixon, C., Tuyls, K., Eds.; Springer: Cham, Switzerland, 2015; pp. 246–257. [Google Scholar]
- Li, M.; Yang, W.; Cai, Z.; Yang, S.; Wang, J. Integrating Decision Sharing with Prediction in Decentralized Planning for Multi-Agent Coordination under Uncertainty. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, Macao, China, 10–16 August 2019; AAAI Press: Washington, DC, USA, 2019; pp. 450–456. [Google Scholar]
- Wu, J.; Durfee, E.H. Resource-Driven Mission-Phasing Techniques for Constrained Agents in Stochastic Environments. J. Artif. Int. Res. 2010, 38, 415–473. [Google Scholar] [CrossRef]
- Kartal, B.; Nunes, E.; Godoy, J.; Gini, M. Monte Carlo Tree Search for Multi-Robot Task Allocation. Proc. AAAI Conf. Artif. Intell. 2016, 30. [Google Scholar] [CrossRef]
- Haouassi, M.; Kergosien, Y.; Mendoza, J.E.; Rousseau, L.M. The integrated orderline batching, batch scheduling, and picker routing problem with multiple pickers: The benefits of splitting customer orders. Flex. Serv. Manuf. J. 2021, 34, 614–645. [Google Scholar] [CrossRef]
- Boutilier, C. Plannning, learning and coordination in multiagent decision processes. In Proceedings of the 6th Conference on Theoretical Aspects of Rationality and Knowledge, Renesse, The Netherlands, 17–20 March 1996; Morgan Kaufmann: San Francisco, CA, USA, 1996; pp. 195–210. [Google Scholar]
- Puterman, M.L. Markov Decision Processes—Dicrete Stochastic Dynamic Programming; John Wiley & Sons, Inc.: New York, NY, USA, 1994. [Google Scholar]
- Silver, D.; Huang, A.; Maddison, C.; Guez, A.; Sifre, L.; Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; et al. Mastering the game of Go with deep neural networks and tree search. Nature 2016, 529, 484–489. [Google Scholar] [CrossRef] [PubMed]
- Mahajan, A.; Teneketzis, D. Multi-Armed Bandit Problems. In Foundations and Applications of Sensor Management; Hero, A.O., Castañón, D.A., Cochran, D., Kastella, K., Eds.; Springer: Boston, MA, USA, 2008; pp. 121–151. [Google Scholar]
- Auer, P.; Cesa-Bianchi, N.; Fischer, P. Finite-time Analysis of the Multiarmed Bandit Problem. Mach. Learn. 2002, 47, 235–256. [Google Scholar] [CrossRef]
- Lanctot, M.; Wittlinger, C.; Winands, M.H.; Den Teuling, N.G. Monte Carlo tree search for simultaneous move games: A case study in the game of Tron. In Proceedings of the Benelux Conference on Artificial Intelligence (BNAIC), Delft, The Netherlands, 7–8 November 2013; Delft University of Technology: Delft, The Netherlands, 2013; Volume 2013, pp. 104–111. [Google Scholar]
Action | A | B | C |
---|---|---|---|
Number of selections | 4 | 3 | 3 |
Cumulative reward | 10 | 5 | 40 |
Parameter | Value |
---|---|
numAgents | 3 |
lastTturn | 100 |
newItemProb | 0.1 |
numIters | 20,000 |
maxDepth | 20 |
expandThresh | 2 |
reward | 100 |
penalty | −5 |
discountFactor | 0.9 |
randSeed | 123 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Warita, S.; Fujita, K. Online Planning for Autonomous Mobile Robots with Different Objectives in Warehouse Commissioning Task. Information 2024, 15, 130. https://doi.org/10.3390/info15030130
Warita S, Fujita K. Online Planning for Autonomous Mobile Robots with Different Objectives in Warehouse Commissioning Task. Information. 2024; 15(3):130. https://doi.org/10.3390/info15030130
Chicago/Turabian StyleWarita, Satoshi, and Katsuhide Fujita. 2024. "Online Planning for Autonomous Mobile Robots with Different Objectives in Warehouse Commissioning Task" Information 15, no. 3: 130. https://doi.org/10.3390/info15030130
APA StyleWarita, S., & Fujita, K. (2024). Online Planning for Autonomous Mobile Robots with Different Objectives in Warehouse Commissioning Task. Information, 15(3), 130. https://doi.org/10.3390/info15030130