Queue Up Your Regrets: Achieving the Dynamic Capacity Region of Multiplayer Bandits

Dec-23-2025, 17:16:12 GMT–Neural Information Processing Systems

Consider $N$ cooperative agents such that for $T$ turns, each agent n takes an action $a_{n}$ and receives a stochastic reward $r_{n}\left(a_{1},\ldots,a_{N}\right)$. Agents cannot observe the actions of other agents and do not know even their own reward function. The agents can communicate with their neighbors on a connected graph $G$ with diameter $d\left(G\right)$. We want each agent $n$ to achieve an expected average reward of at least $\lambda_{n}$ over time, for a given quality of service (QoS) vector $\boldsymbol{\lambda}$. A QoS vector $\boldsymbol{\lambda}$ is not necessarily achievable.

agent, dynamic capacity region, name change, (11 more...)

Neural Information Processing Systems

Dec-23-2025, 17:16:12 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.58)