Learning to Coordinate Under Threshold Rewards: A Cooperative Multi-Agent Bandit Framework
Ledford, Michael, Regli, William
–arXiv.org Artificial Intelligence
Cooperative multi-agent systems often face tasks that require coordinated actions under uncertainty. While multi-armed bandit (MAB) problems provide a powerful framework for decentralized learning, most prior work assumes individually attainable rewards. We address the challenging setting where rewards are threshold-activated: an arm yields a payoff only when a minimum number of agents pull it simultaneously, with this threshold unknown in advance. Complicating matters further, some arms are decoys - requiring coordination to activate but yielding no reward - introducing a new challenge of wasted joint exploration. We introduce Threshold-Coop-UCB (T-Coop-UCB), a decentralized algorithm that enables agents to jointly learn activation thresholds and reward distributions, forming effective coalitions without centralized control. Empirical results show that T-Coop-UCB consistently outperforms baseline methods in cumulative reward, regret, and coordination metrics, achieving near-Oracle performance. Our findings underscore the importance of joint threshold learning and decoy avoidance for scalable, decentralized cooperation in complex multi-agent
arXiv.org Artificial Intelligence
Jun-23-2025
- Country:
- Africa > South Sudan
- Equatoria > Central Equatoria > Juba (0.04)
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America > United States
- Maryland > Prince George's County > College Park (0.04)
- Africa > South Sudan
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Health & Medicine (0.82)
- Technology: