communication graph
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
- North America > United States (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- Europe > Sweden > Vaestra Goetaland > Gothenburg (0.04)
- (2 more...)
- North America > United States > Pennsylvania (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- North America > United States > California (0.14)
- Asia > China > Shanghai > Shanghai (0.05)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Vision (0.96)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.94)
056e8e9c8ca9929cb6cf198952bf1dbb-Supplemental-Conference.pdf
This search does not affect the computational complexity, which is O(νnDE +SE) for agent n that computes DE parallel consensus steps and goes over a listofSE actionprofiles. Intuitively,wewouldneedE KN tofindtheoptimalactionprofile even with no noise, which creates delays where agents have to wait for their average reward to go abovetheirλn. In the multitasking robots game, if agent n has Ren = 0, then theoptimalactionprofilea e hastosatisfya e,m = nforallm. Ifλisasafemarginawayfromthe boundary of C(G), then most agents will have Ren = 0 most of the time. Hence, their performance depends on the best action profile in SE.
Individual Regret in Cooperative Nonstochastic Multi-Armed Bandits
We study agents communicating over an underlying network by exchanging messages, in order to optimize their individual regret in a common nonstochastic multi-armed bandit problem. We derive regret minimization algorithms that guarantee for each agent $v$ an individual expected regret of $\widetilde{O}\left(\sqrt{\left(1+\frac{K}{\left|\mathcal{N}\left(v\right)\right|}\right)T}\right)$, where $T$ is the number of time steps, $K$ is the number of actions and $\mathcal{N}\left(v\right)$ is the set of neighbors of agent $v$ in the communication graph. We present algorithms both for the case that the communication graph is known to all the agents, and for the case that the graph is unknown. When the graph is unknown, each agent knows only the set of its neighbors and an upper bound on the total number of agents. The individual regret between the models differs only by a logarithmic factor.
Neurosymbolic Transformers for Multi-Agent Communication
We study the problem of inferring communication structures that can solve cooperative multi-agent planning problems while minimizing the amount of communication. We quantify the amount of communication as the maximum degree of the communication graph; this metric captures settings where agents have limited bandwidth. Minimizing communication is challenging due to the combinatorial nature of both the decision space and the objective; for instance, we cannot solve this problem by training neural networks using gradient descent. We propose a novel algorithm that synthesizes a control policy that combines a programmatic communication policy used to generate the communication graph with a transformer policy network used to choose actions. Our algorithm first trains the transformer policy, which implicitly generates a soft communication graph; then, it synthesizes a programmatic communication policy that hardens this graph, forming a neurosymbolic transformer. Our experiments demonstrate how our approach can synthesize policies that generate low-degree communication graphs while maintaining near-optimal performance.
MASPRM: Multi-Agent System Process Reward Model
Yazdani, Milad, Mostajabdaveh, Mahdi, Zhou, Zirui, Xiong, Ying
Practical deployment of Multi-Agent Systems (MAS) demands strong test-time performance, motivating methods that guide inference-time search and selectively spend compute to improve quality. We present the Multi-Agent System Process Reward Model (MASPRM). It assigns per-action, per-agent values to partial inter-agent transcripts and acts as an inference-time controller. MASPRM is trained from multi-agent Monte Carlo Tree Search (MCTS) rollouts without requiring step-level human annotations, by propagating returns to local targets. At inference, MASPRM guides step-level beam search and MCTS, focusing computation on promising branches and pruning early. On GSM8K and MATH, MASPRM-guided decoding with an outcome reward model (ORM) applied to the final answer, improves exact match (EM) over a single straight-through MAS pass by $+30.7$ and $+22.9$ points, respectively. A MASPRM trained on GSM8K transfers zero-shot to MATH without retraining, adding $8.4$ EM points at the same budget. MASPRM is a plug-in value model that estimates per-agent progress and complements verifier-style decoders, enabling more reliable, compute-aware multi-agent reasoning. Code: https://github.com/milad1378yz/MASPRM
- North America > Canada > British Columbia > Metro Vancouver Regional District > Burnaby (0.14)
- North America > Canada > British Columbia > Vancouver (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- North America > United States > California (0.14)
- Asia > China > Shanghai > Shanghai (0.05)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Vision (0.96)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.94)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
- North America > United States (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)