Optimal cross-learning for contextual bandits with unknown context distributions
–Neural Information Processing Systems
We consider the problem of designing contextual bandit algorithms in the "crosslearning" setting of Balseiro et al., where the learner observes the loss for the action they play in all possible contexts, not just the context of the current round. We specifically consider the setting where losses are chosen adversarially and contexts are sampled i.i.d.
Neural Information Processing Systems
May-25-2025, 07:18:45 GMT