navigation scheme
RTAW: An Attention Inspired Reinforcement Learning Method for Multi-Robot Task Allocation in Warehouse Environments
Agrawal, Aakriti, Bedi, Amrit Singh, Manocha, Dinesh
We present a novel reinforcement learning based algorithm for multi-robot task allocation problem in warehouse environments. We formulate it as a Markov Decision Process and solve via a novel deep multi-agent reinforcement learning method (called RTAW) with attention inspired policy architecture. Hence, our proposed policy network uses global embeddings that are independent of the number of robots/tasks. We utilize proximal policy optimization algorithm for training and use a carefully designed reward to obtain a converged policy. The converged policy ensures cooperation among different robots to minimize total travel delay (TTD) which ultimately improves the makespan for a sufficiently large task-list. In our extensive experiments, we compare the performance of our RTAW algorithm to state of the art methods such as myopic pickup distance minimization (greedy) and regret based baselines on different navigation schemes. We show an improvement of upto 14% (25-1000 seconds) in TTD on scenarios with hundreds or thousands of tasks for different challenging warehouse layouts and task generation schemes. We also demonstrate the scalability of our approach by showing performance with up to $1000$ robots in simulations.
- North America > United States > Maryland (0.04)
- Europe > Hungary (0.04)
- Asia > Middle East > Jordan (0.04)
DC-MRTA: Decentralized Multi-Robot Task Allocation and Navigation in Complex Environments
Agrawal, Aakriti, Hariharan, Senthil, Bedi, Amrit Singh, Manocha, Dinesh
We present a novel reinforcement learning (RL) based task allocation and decentralized navigation algorithm for mobile robots in warehouse environments. Our approach is designed for scenarios in which multiple robots are used to perform various pick up and delivery tasks. We consider the problem of joint decentralized task allocation and navigation and present a two level approach to solve it. At the higher level, we solve the task allocation by formulating it in terms of Markov Decision Processes and choosing the appropriate rewards to minimize the Total Travel Delay (TTD). At the lower level, we use a decentralized navigation scheme based on ORCA that enables each robot to perform these tasks in an independent manner, and avoid collisions with other robots and dynamic obstacles. We combine these lower and upper levels by defining rewards for the higher level as the feedback from the lower level navigation algorithm. We perform extensive evaluation in complex warehouse layouts with large number of agents and highlight the benefits over state-of-the-art algorithms based on myopic pickup distance minimization and regret-based task selection. We observe improvement up to 14% in terms of task completion time and up-to 40% improvement in terms of computing collision-free trajectories for the robots.
- North America > United States > Maryland > Prince George's County > College Park (0.14)
- North America > United States > New York (0.04)
- Research Report (0.64)
- Overview (0.46)
Multi-Select Faceted Navigation Based on Minimum Description Length Principle
He, Chao (Chinese Academy of Sciences) | Cheng, Xueqi (Chinese Academy of Sciences) | Guo, Jiafeng (Chinese Academy of Sciences) | Shen, Huawei (Chinese Academy of Sciences)
Faceted navigation can effectively reduce user efforts of reaching targeted resources in databases, by suggesting dynamic facet values for iterative query refinement. A key issue is minimizing the navigation cost in a user query session. Conventional navigation scheme assumes that at each step, users select only one suggested value to figure out resources containing it. To make faceted navigation more flexible and effective, this paper introduces a multi-select scheme where multiple suggested values can be selected at one step, and a selected value can be used to either retain or exclude the resources containing it. Previous algorithms for cost-driven value suggestion can hardly work well under our navigation scheme. Therefore, we propose to optimize the navigation cost using the Minimum Description Length principle, which can well balance the number of navigation steps and the number of suggested values per step under our new scheme. An emperical study demonstrates that our approach is more cost-saving and efficient than state-of-the-art approaches.
- Workflow (0.49)
- Research Report > Promising Solution (0.34)