Bandits with Feedback Graphs and Switching Costs

Raman Arora, Teodor Vanislavov Marinov, Mehryar Mohri

Neural Information Processing Systems 

We study the adversarial multi-armed bandit problem where the learner is supplied with partial observations modeled by a feedback graph and where shifting to a new action incurs a fixed switching cost.