Goto

Collaborating Authors

 multi-agent planning


MALinZero: Efficient Low-Dimensional Search for Mastering Complex Multi-Agent Planning

Neural Information Processing Systems

Monte Carlo Tree Search (MCTS), which leverages Upper Confidence Bound for Trees (UCTs) to balance exploration and exploitation through randomized sampling, is instrumental to solving complex planning problems. However, for multi-agent planning, MCTS is confronted with a large combinatorial action space that often grows exponentially with the number of agents. As a result, the branching factor of MCTS during tree expansion also increases exponentially, making it very difficult to efficiently explore and exploit during tree search. To this end, we propose MALinZero, a new approach to leverage low-dimensional representational structures on joint-action returns and enable efficient MCTS in complex multi-agent planning. Our solution can be viewed as projecting the joint-action returns into the low-dimensional space representable using a contextual linear bandit problem formulation. We solve the contextual linear bandit problem with convex and $\mu$-smooth loss functions -- in order to place more importance on better joint actions and mitigate potential representational limitations -- and derive a linear Upper Confidence Bound applied to trees (LinUCT) to enable novel multi-agent exploration and exploitation in the low-dimensional space. We analyze the regret of MALinZero for low-dimensional reward functions and propose an $(1-\tfrac1e)$-approximation algorithm for the joint action selection by maximizing a sub-modular objective. MALinZero demonstrates state-of-the-art performance on multi-agent benchmarks such as matrix games, SMAC, and SMACv2, outperforming both model-based and model-free multi-agent reinforcement learning baselines with faster learning speed and better performance.


LLM+MAP: Bimanual Robot Task Planning using Large Language Models and Planning Domain Definition Language

arXiv.org Artificial Intelligence

Bimanual robotic manipulation provides significant versatility, but also presents an inherent challenge due to the complexity involved in the spatial and temporal coordination between two hands. Existing works predominantly focus on attaining human-level manipulation skills for robotic hands, yet little attention has been paid to task planning on long-horizon timescales. With their outstanding in-context learning and zero-shot generation abilities, Large Language Models (LLMs) have been applied and grounded in diverse robotic embodiments to facilitate task planning. However, LLMs still suffer from errors in long-horizon reasoning and from hallucinations in complex robotic tasks, lacking a guarantee of logical correctness when generating the plan. Previous works, such as LLM+P, extended LLMs with symbolic planners. However, none have been successfully applied to bimanual robots. New challenges inevitably arise in bimanual manipulation, necessitating not only effective task decomposition but also efficient task allocation. To address these challenges, this paper introduces LLM+MAP, a bimanual planning framework that integrates LLM reasoning and multi-agent planning, automating effective and efficient bimanual task planning. We conduct simulated experiments on various long-horizon manipulation tasks of differing complexity. Our method is built using GPT-4o as the backend, and we compare its performance against plans generated directly by LLMs, including GPT-4o, V3 and also recent strong reasoning models o1 and R1. By analyzing metrics such as planning time, success rate, group debits, and planning-step reduction rate, we demonstrate the superior performance of LLM+MAP, while also providing insights into robotic reasoning. Code is available at https://github.com/Kchu/LLM-MAP.


GPUDrive: Data-driven, multi-agent driving simulation at 1 million FPS

arXiv.org Artificial Intelligence

Multi-agent learning algorithms have been successful at generating superhuman planning in a wide variety of games but have had little impact on the design of deployed multi-agent planners. A key bottleneck in applying these techniques to multi-agent planning is that they require billions of steps of experience. To enable the study of multi-agent planning at this scale, we present GPUDrive, a GPU-accelerated, multi-agent simulator built on top of the Madrona Game Engine that can generate over a million steps of experience per second. Observation, reward, and dynamics functions are written directly in C++, allowing users to define complex, heterogeneous agent behaviors that are lowered to high-performance CUDA. We show that using GPUDrive we are able to effectively train reinforcement learning agents over many scenes in the Waymo Motion dataset, yielding highly effective goal-reaching agents in minutes for individual scenes and generally capable agents in a few hours.


Fairness in Multi-Agent Planning

arXiv.org Artificial Intelligence

In cooperative Multi-Agent Planning (MAP), a set of goals has to be achieved by a set of agents. Independently of whether they perform a pre-assignment of goals to agents or they directly search for a solution without any goal assignment, most previous works did not focus on a fair distribution/achievement of goals by agents. This paper adapts well-known fairness schemes to MAP, and introduces two novel approaches to generate cost-aware fair plans. The first one solves an optimization problem to pre-assign goals to agents, and then solves a centralized MAP task using that assignment. The second one consists of a planning-based compilation that allows solving the joint problem of goal assignment and planning while taking into account the given fairness scheme. Empirical results in several standard MAP benchmarks show that these approaches outperform different baselines. They also show that there is no need to sacrifice much plan cost to generate fair plans.


Cooperative Behavior Planning for Automated Driving using Graph Neural Networks

arXiv.org Artificial Intelligence

Urban intersections are prone to delays and inefficiencies due to static precedence rules and occlusions limiting the view on prioritized traffic. Existing approaches to improve traffic flow, widely known as automatic intersection management systems, are mostly based on non-learning reservation schemes or optimization algorithms. Machine learning-based techniques show promising results in planning for a single ego vehicle. This work proposes to leverage machine learning algorithms to optimize traffic flow at urban intersections by jointly planning for multiple vehicles. Learning-based behavior planning poses several challenges, demanding for a suited input and output representation as well as large amounts of ground-truth data. We address the former issue by using a flexible graph-based input representation accompanied by a graph neural network. This allows to efficiently encode the scene and inherently provide individual outputs for all involved vehicles. To learn a sensible policy, without relying on the imitation of expert demonstrations, the cooperative planning task is considered as a reinforcement learning problem. We train and evaluate the proposed method in an open-source simulation environment for decision making in automated driving. Compared to a first-in-first-out scheme and traffic governed by static priority rules, the learned planner shows a significant gain in flow rate, while reducing the number of induced stops. In addition to synthetic simulations, the approach is also evaluated based on real-world traffic data taken from the publicly available inD dataset.


EgoPlan : A framework for multi-agent planning using single agent planners - Strathprints

#artificialintelligence

Planning problems are, in general, PSPACE-complete; large problems, especially multi-agent problems with required coordination, can be intractable or impractical to solve. Factored planning and multi-agent planning both address this by separating multi-agent problems into tractable sub-problems, but there are limitations in the expressivity of existing planners and in the ability to handle tightly coupled multi-agent problems. This paper presents EGOPLAN, a framework which factors a multi-agent problem into related sub-problems which are solved by iteratively calling on a single agent planner. EGOPLAN is evaluated on a multi-robot test domain with durative actions, required coordination, and temporal constraints, comparing the performance of a temporal planner, OPTIC-CPLEX, with and without EGOPLAN. Our results show that for our test domain, using EGOPLAN allows OPTIC-CPLEX to solve problems that are twice as complex as it can solve without EGOPLAN, and to solve complex problems significantly faster.


Tožička

AAAI Conferences

Multi-agent planning using MA-STRIPS-related models is often motivated by the preservation of private information. Such motivation is not only natural for multi-agent systems, but it is one of the main reasons, why multi-agent planning (MAP) problems cannot be solved centrally. In this paper, we analyze privacy-preserving multi-agent planning (PP-MAP) from the perspective of secure multiparty computation (MPC). We discuss the concept of strong privacy and its implications and present two variants of a novel planner, provably strong privacy-preserving in general. As the main contribution, we formulate the limits of strong privacy-preserving planning in the terms of privacy, completeness and efficiency and show that, for a wide class of planning algorithms, all three properties are not achievable at once. Moreover, we provide a restricted variant of strong privacy based on equivalence classes of planning problems and show that an efficient, complete and strong privacy-preserving planner exists for such restriction.


Partial Disclosure of Private Dependencies in Privacy Preserving Planning

arXiv.org Artificial Intelligence

In collaborative privacy preserving planning (CPPP), a group of agents jointly creates a plan to achieve a set of goals while preserving each others' privacy. During planning, agents often reveal the private dependencies between their public actions to other agents, that is, which public action facilitates the preconditions of another public action. Previous work in CPPP does not limit the disclosure of such dependencies. In this paper, we explicitly limit the amount of disclosed dependencies, allowing agents to publish only a part of their private dependencies. We investigate different strategies for deciding which dependencies to publish, and how they affect the ability to find solutions. We evaluate the ability of two solvers -- distribute forward search and centralized planning based on a single-agent projection -- to produce plans under this constraint. Experiments over standard CPPP domains show that the proposed dependency-sharing strategies enable generating plans while sharing only a small fraction of all private dependencies.


Differentially Private Multi-Agent Planning for Logistic-like Problems

arXiv.org Artificial Intelligence

Planning is one of the main approaches used to improve agents' working efficiency by making plans beforehand. However, during planning, agents face the risk of having their private information leaked. This paper proposes a novel strong privacy-preserving planning approach for logistic-like problems. This approach outperforms existing approaches by addressing two challenges: 1) simultaneously achieving strong privacy, completeness and efficiency, and 2) addressing communication constraints. These two challenges are prevalent in many real-world applications including logistics in military environments and packet routing in networks. To tackle these two challenges, our approach adopts the differential privacy technique, which can both guarantee strong privacy and control communication overhead. To the best of our knowledge, this paper is the first to apply differential privacy to the field of multi-agent planning as a means of preserving the privacy of agents for logistic-like problems. We theoretically prove the strong privacy and completeness of our approach and empirically demonstrate its efficiency. We also theoretically analyze the communication overhead of our approach and illustrate how differential privacy can be used to control it.


Trial-Based Heuristic Tree-Search for Distributed Multi-Agent Planning

AAAI Conferences

We present a novel search scheme for privacy-preserving multi-agent planning. Inspired by UCT search, the scheme is based on growing an asynchronous search tree by running repeated trials through the tree. We describe key differences to classical multi-agent forward search, discuss theoretical properties of the presented approach, and evaluate it based on benchmarks from the CoDMAP competition. As a secondary contribution, we describe a technique that extends the regular search approach by small explorative trials which are performed subsequent to each node expansion. We show that this technique significantly increases the number of problems solved for all algorithms considered, including MAFS.