Optimal cross-learning for contextual bandits with unknown context distributions

Neural Information Processing Systems 

We consider the problem of designing contextual bandit algorithms in the "crosslearning" setting of Balseiro et al., where the learner observes the loss for the action they play in all possible contexts, not just the context of the current round. We specifically consider the setting where losses are chosen adversarially and contexts are sampled i.i.d.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found