Bandits with Feedback Graphs and Switching Costs
Raman Arora, Teodor Vanislavov Marinov, Mehryar Mohri
–Neural Information Processing Systems
We study the adversarial multi-armed bandit problem where the learner is supplied with partial observations modeled by a feedback graph and where shifting to a new action incurs a fixed switching cost.
Neural Information Processing Systems
Jan-27-2025, 05:50:41 GMT