Bandits with Feedback Graphs and Switching Costs
Raman Arora, Teodor Vanislavov Marinov, Mehryar Mohri
–Neural Information Processing Systems
We study the adversarial multi-armed bandit problem where the learner is supplied with partial observations modeled by a feedback graph and where shifting to a new action incurs a fixed switching cost.
Neural Information Processing Systems
Mar-26-2025, 23:13:30 GMT