Statistical and Computational Trade-off in Multi-Agent Multi-Armed Bandits

Apr-29-2026, 23:33:25 GMT–Neural Information Processing Systems

We study the problem of regret minimization in Multi-Agent Multi-Armed Bandits (MAMABs) where the rewards are defined through a factor graph. We derive an instance-specific regret lower bound and characterize the minimal expected number of times each global action should be explored. This bound and the corresponding optimal exploration process are obtained by solving a combinatorial optimization problem whose set of variables and constraints exponentially grow with the number of agents, and cannot be exploited in the design of efficient algorithms. Inspired by Mean Field approximation techniques used in graphical models, we provide simple upper bounds of the regret lower bound. The corresponding optimization problems have a reduced number of variables and constraints. By tuning the latter, we may explore the trade-off between the achievable regret and the complexity of computing the corresponding exploration process. We devise Efficient Sampling for MAMAB (ESM), an algorithm whose regret asymptotically matches the approximated lower bounds. The regret and computational complexity of ESM are assessed numerically, using both synthetic and real-world experiments in radio communications networks.

algorithm, artificial intelligence, optimization problem, (15 more...)

Neural Information Processing Systems

Apr-29-2026, 23:33:25 GMT

Conferences PDF

Add feedback

Country:
- Europe > Sweden (0.14)

Technology:
- Information Technology > Artificial Intelligence > Representation & Reasoning
  - Optimization (1.00)
  - Agents > Agent Societies (0.48)

Duplicate Docs Excel Report

Title
d9c7c8bd6ad4cebb7d006e5109e0b682-Paper-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found