Queue Up Your Regrets: Achieving the Dynamic Capacity Region of Multiplayer Bandits

Oct-9-2024, 10:39:49 GMT–Neural Information Processing Systems

Consider N cooperative agents such that for T turns, each agent n takes an action a_{n} and receives a stochastic reward r_{n}\left(a_{1},\ldots,a_{N}\right) . Agents cannot observe the actions of other agents and do not know even their own reward function. The agents can communicate with their neighbors on a connected graph G with diameter d\left(G\right) . We want each agent n to achieve an expected average reward of at least \lambda_{n} over time, for a given quality of service (QoS) vector \boldsymbol{\lambda} . By giving up on immediate reward, knowing that the other agents will compensate later, agents can improve their achievable capacity region.

agent, boldsymbol, dynamic capacity region, (8 more...)

Neural Information Processing Systems

Oct-9-2024, 10:39:49 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.59)