Optimal Algorithms for Stochastic Contextual Preference Bandits

Neural Information Processing Systems 

We consider the problem of preference bandits in the contextual setting.