Optimal cross-learning for contextual bandits with unknown context distributions

Neural Information Processing Systems 

We consider the problem of designing contextual bandit algorithms in the "cross-learning" setting of Balseiro et al., where the learner observes the loss for the action

Similar Docs  Excel Report  more

TitleSimilaritySource
None found