Collaborative Linear Bandits with Adversarial Agents: Near-Optimal Regret Bounds

Jan-17-2025, 17:15:07 GMT–Neural Information Processing Systems

We consider a linear stochastic bandit problem involving M agents that can collaborate via a central server to minimize regret. A fraction \alpha of these agents are adversarial and can act arbitrarily, leading to the following tension: while collaboration can potentially reduce regret, it can also disrupt the process of learning due to adversaries. In this work, we provide a fundamental understanding of this tension by designing new algorithms that balance the exploration-exploitation trade-off via carefully constructed robust confidence intervals. We also complement our algorithms with tight analyses. First, we develop a robust collaborative phased elimination algorithm that achieves \tilde{O}\left(\alpha 1/\sqrt{M}\right) \sqrt{dT} regret for each good agent; here, d is the model-dimension and T is the horizon.

adversarial agent, collaborative linear bandit, near-optimal regret bound, (4 more...)

Neural Information Processing Systems

Jan-17-2025, 17:15:07 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.98)